NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Accelerating ML Workloads using GPU Tensor Cores: The Good, the Bad, and the Ugly

https://doi.org/10.1145/3629526.3653835

Hanindhito, Bagus; John, Lizy K (May 2024, ACM)

Full Text Available
Bandwidth Characterization of DeepSpeed on Distributed Large Language Model Training

https://doi.org/10.1109/ISPASS61541.2024.00031

Hanindhito, Bagus; Patel, Bhavesh; John, Lizy K (May 2024, IEEE)

Full Text Available
CoMeFa: Deploying Compute-in-Memory on FPGAs for Deep Learning Acceleration

https://doi.org/10.1145/3603504

Arora, Aman; Bhamburkar, Atharva; Borda, Aatman; Anand, Tanmay; Sehgal, Rishabh; Hanindhito, Bagus; Gaillardon, Pierre-Emmanuel; Kulkarni, Jaydeep; John, Lizy K. (July 2023, ACM Transactions on Reconfigurable Technology and Systems)

Block random access memories (BRAMs) are the storage houses of FPGAs, providing extensive on-chip memory bandwidth to the compute units implemented using logic blocks and digital signal processing slices. We propose modifying BRAMs to convert them to CoMeFa (Compute-in-Memory Blocks forFPGAs) random access memories (RAMs). These RAMs provide highly parallel compute-in-memory by combining computation and storage capabilities in one block. CoMeFa RAMs utilize the true dual-port nature of FPGA BRAMs and contain multiple configurable single-bit bit-serial processing elements. CoMeFa RAMs can be used to compute with any precision, which is extremely important for applications like deep learning (DL). Adding CoMeFa RAMs to FPGAs significantly increases their compute density while also reducing data movement. We explore and propose two architectures of these RAMs: CoMeFa-D (optimized for delay) and CoMeFa-A (optimized for area). Compared to existing proposals, CoMeFa RAMs do not require changing the underlying static RAM technology like simultaneously activating multiple wordlines on the same port, and are practical to implement. CoMeFa RAMs are especially suitable for parallel and compute-intensive applications like DL, but these versatile blocks find applications in diverse applications like signal processing and databases, among others. By augmenting an Intel Arria 10–like FPGA with CoMeFa-D (CoMeFa-A) RAMs at the cost of 3.8% (1.2%) area, and with algorithmic improvements and efficient mapping, we observe a geomean speedup of 2.55× (1.85×) across microbenchmarks from various applications and a geomean speedup of up to 2.5× across multiple deep neural networks. Replacing all or some BRAMs with CoMeFa RAMs in FPGAs can make them better accelerators of DL workloads.
more » « less
Full Text Available
GAPS: GPU-acceleration of PDE solvers for wave simulation

https://doi.org/10.1145/3524059.3532373

Hanindhito, Bagus; Gourounas, Dimitrios; Fathi, Arash; Trenev, Dimitar; Gerstlauer, Andreas; John, Lizy K. (June 2022, ACM International Conference on Supercomputing (ICS))

Full Text Available
CoMeFa: Compute-in-Memory Blocks for FPGAs

https://doi.org/10.1109/FCCM53951.2022.9786179

Arora, Aman; Anand, Tanmay; Borda, Aatman; Sehgal, Rishabh; Hanindhito, Bagus; Kulkarni, Jaydeep; John, Lizy K. (May 2022, International Symposium on Field-Programmable Custom Computing Machines)

Full Text Available
Wave-PIM: Accelerating Wave Simulation Using Processing-in-Memory

https://doi.org/10.1145/3472456.3472512

Hanindhito, Bagus; Li, Ruihao; Gourounas, Dimitrios; Fathi, Arash; Govil, Karan; Trenev, Dimitar; Gerstlauer, Andreas; John, Lizy (August 2021, International Conference on Parallel Processing (ICPP))
null (Ed.)
Full Text Available
Demystifying the MLPerf Training Benchmark Suite

https://doi.org/10.1109/ISPASS48437.2020.00013

Verma, Snehil; Wu, Qinzhe; Hanindhito, Bagus; Jha, Gunjan; John, Eugene B.; Radhakrishnan, Ramesh; John, Lizy K. (August 2020, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS))

Full Text Available
The gem5 Simulator: Version 20.0+: A new era for the open-source computer architecture simulator

Lowe-Power, Jason; Ahmad, Abdul Mutaal; Akram, Ayaz; Alian, Mohammad; Amslinger, Rico; Andreozzi, Matteo; Armejach, Adrià; Asmussen, Nils; Bharadwaj, Srikant; Black, Gabe; et al (July 2020, ArXivorg)

The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research.
more » « less
Full Text Available

Search for: All records