NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Florets for Chiplets: Data flow-aware high-performance and energy-efficient Network-on-Interposer for CNN inference tasks.

Harsh Sharma, Lukas Pfromm (July 2023, Proc. 2023 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS, ESWEEK))

Full Text Available
GraphIte: Accelerating Iterative Graph Algorithms on ReRAM Architectures via Approximate Computing

https://doi.org/10.23919/DATE56975.2023.10137001

Choudhury, Dwaipayan; Kalyanaraman, Ananth; Pande, Partha (April 2023, Proc. 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE))

Full Text Available
Dynamic Community Detection Decouples Multiple Time Scale Behavior of Complex Chemical Systems

https://doi.org/10.1021/acs.jctc.2c00454

Zarayeneh, Neda; Kumar, Nitesh; Kalyanaraman, Ananth; Clark, Aurora E. (December 2022, Journal of Chemical Theory and Computation)

Full Text Available
IMpart: A Partitioning-based Parallel Approach to Accelerate Influence Maximization

https://doi.org/10.1109/HiPC56025.2022.00028

Barik, Reet; Minutoli, Marco; Halappanavar, Mahantesh; Kalyanaraman, Ananth (December 2022, Proceedings of the International Conference on High Performance Computing, Data, and Analytics (HiPC))

Full Text Available
HBMax: Optimizing Memory Efficiency for Parallel Influence Maximization on Multicore Architectures

https://doi.org/10.1145/3559009.3569647

Chen, Xinyu; Minutoli, Marco; Tian, Jiannan; Halappanavar, Mahantesh; Kalyanaraman, Ananth; Tao, Dingwen (October 2022, Proceedings of the International Conference on Parallel Architectures and Compilation Techniques)

Full Text Available
Accelerating Graph Computations on 3D NoC-enabled PIM Architectures

https://doi.org/10.1145/3564290

Choudhury, Dwaipayan; Xiang, Lizhi; Rajam, Aravind Sukumaran; Kalyanaraman, Ananth; Pande, Partha Pratim (October 2022, ACM Transactions on Design Automation of Electronic Systems)

Graph application workloads are dominated by random memory accesses with poor locality. To tackle the irregular and sparse nature of computation, ReRAM-based Processing-in-Memory (PIM) architectures have been proposed recently. Most of these ReRAM architecture designs have focused on mapping graph computations into a set of multiply-and-accumulate (MAC) operations. ReRAMs also offer a key advantage in reducing memory latency between cores and memory by allowing for processing-in-memory (PIM). However, when implemented on a ReRAM-based manycore architecture, graph applications still pose two key challenges – significant storage requirements (particularly due to wasted zero cell storage), and significant amount of on-chip traffic. To tackle these two challenges, in this paper we propose the design of a 3D NoC-enabled ReRAM-based manycore architecture. Our proposed architecture incorporates a novel crossbar-aware node reordering to reduce ReRAM storage requirements. Secondly, its 3D NoC-enabled design reduces on-chip communication latency. Our architecture outperforms the state-of-the-art in ReRAM-based graph acceleration by up to 5x in performance while consuming up to 10.3x less energy for a range of graph inputs and workloads.
more » « less
Full Text Available
Software/Hardware Co-design of 3D NoC-based GPU Architectures for Accelerated Graph Computations

https://doi.org/10.1145/3514354

Choudhury, Dwaipayan; Barik, Reet; Rajam, Aravind_Sukumaran; Kalyanaraman, Ananth; Pande, Partha_Pratim (June 2022, ACM Transactions on Design Automation of Electronic Systems)

Manycore GPU architectures have become the mainstay for accelerating graph computations. One of the primary bottlenecks to performance of graph computations on manycore architectures is the data movement. Since most of the accesses in graph processing are due to vertex neighborhood lookups, locality in graph data structures plays a key role in dictating the degree of data movement. Vertex reordering is a widely used technique to improve data locality within graph data structures. However, these reordering schemes alone are not sufficient as they need to be complemented with efficient task allocation on manycore GPU architectures to reduce latency due to local cache misses. Consequently, in this article, we introduce a software/hardware co-design framework for accelerating graph computations. Our approach couples an architecture-aware vertex reordering with a priority-based task allocation technique. As the task allocation aims to reduce on-chip latency and associated energy, the choice of Network-on-Chip (NoC) as the communication backbone in the manycore platform is an important parameter. By leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)-enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SMs) and the memory controllers (MCs) follow a power-law distribution. The proposed 3D SWNoC-enabled software/hardware co-design framework achieves 11.1% to 22.9% performance improvement and 16.4% to 32.6% less energy consumption depending on the dataset and the graph application, when compared to the default order of dataset running on a conventional planar mesh architecture.
more » « less
High-Performance and Energy-Efficient 3D Manycore GPU Architecture for Accelerating Graph Analytics

https://doi.org/10.1145/3482880

Choudhury, Dwaipayan; Rajam, Aravind Sukumaran; Kalyanaraman, Ananth; Pande, Partha Pratim (January 2022, ACM Journal on Emerging Technologies in Computing Systems)

Recent advances in GPU-based manycore accelerators provide the opportunity to efficiently process large-scale graphs on chip. However, real world graphs have a diverse range of topology and connectivity patterns (e.g., degree distributions) that make the design of input-agnostic hardware architectures a challenge. Network-on-Chip (NoC)- based architectures provide a way to overcome this challenge as the architectural topology can be used to approximately model the expected traffic patterns that emerge from graph application workloads. In this paper, we first study the mix of long- and short-range traffic patterns generated on-chip using graph workloads, and subsequently use the findings to adapt the design of an optimal NoC-based architecture. In particular, by leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)- enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SM) and the memory controllers (MC) follow a power-law distribution. The proposed 3D manycore GPU architecture outperforms the traditional planar (2D) counterparts in both performance and energy consumption. Moreover, by adopting a joint performance-thermal optimization strategy, we address the thermal concerns in a 3D design without noticeably compromising the achievable performance. The 3D integration technology is also leveraged to incorporate Near Data Processing (NDP) to complement the performance benefits introduced by the SWNoC architecture. As graph applications are inherently memory intensive, off-chip data movement gives rise to latency and energy overheads in the presence of external DRAM. In conventional GPU architectures, as the main memory layer is not integrated with the logic, off-chip data movement negatively impacts overall performance and energy consumption. We demonstrate that NDP significantly reduces the overheads associated with such frequent and irregular memory accesses in graph-based applications. The proposed SWNoC-enabled NDP framework that integrates 3D memory (like Micron's HMC) with a massive number of GPU cores achieves 29.5% performance improvement and 30.03% less energy consumption on average compared to a conventional planar Mesh-based design with external DRAM.
more » « less
Full Text Available
PaKman: A Scalable Algorithm for Generating Genomic Contigs on Distributed Memory Machines

https://doi.org/10.1109/TPDS.2020.3043241

Ghosh, Priyanka; Krishnamoorthy, Sriram; Kalyanaraman, Ananth (May 2021, IEEE Transactions on Parallel and Distributed Systems)
Delta-Screening: A Fast and Efficient Technique to Update Communities in Dynamic Graphs

https://doi.org/10.1109/TNSE.2021.3067665

Zarayeneh, Neda; Kalyanaraman, Ananth (April 2021, IEEE Transactions on Network Science and Engineering)

« Prev Next »

Search for: All records