NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RASSM: Residue-based Acceleration of Single Sparse Matrix Computation via Adaptive Tiling

https://doi.org/10.1145/3669940.3707219

Jain, Anirudh; Gupta, Pulkit; Conte, Thomas M (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
ASDF: A Compiler for Qwerty, a Basis-Oriented Quantum Programming Language

https://doi.org/10.1145/3696443.3708966

Adams, Austin J; Khan, Sharjeel; Bhamra, Arjun S; Abusaada, Ryan R; Cabrera, Anthony M; Hoechst, Cameron C; Humble, Travis S; Young, Jeffrey S; Conte, Thomas M (March 2025, ACM)

Free, publicly-accessible full text available March 1, 2026
Incompressible Navier–Stokes solve on noisy quantum hardware via a hybrid quantum–classical scheme

https://doi.org/10.1016/j.compfluid.2024.106507

Song, Zhixin; Deaton, Robert; Gard, Bryan; Bryngelson, Spencer H (February 2025, Computers & Fluids)

Free, publicly-accessible full text available February 1, 2026
Unleashing CPU Potential for Executing GPU Programs Through Compiler/Runtime Optimizations

https://doi.org/10.1109/MICRO61859.2024.00023

Han, Ruobing; Zhao, Jisheng; Kim, Hyesoon (November 2024, IEEE)

Full Text Available
A Workflow for the Synthesis of Irregular Memory Access Microbenchmarks

https://doi.org/10.1145/3695794.3695816

Sheridan, Kevin; Dominguez-Trujillo, Jered; Shipman, Galen; Lavin, Patrick; Scott, Christopher; Vaca_Valverde, Agustin; Vuduc, Richard; Young, Jeffrey (September 2024, ACM)

Full Text Available
Hunting the Needle - The Potential of Innovation in Architecture

Kogger, Peter M; McMahon, Janice; Dysart, Timothy J (September 2024, IEEE Conference on High Performance Extreme Computing)

Subgraph Isomorphism involves using a small graph as a pattern to identify within a larger graph a set of vertices that have edges that match, and is becoming of increasing importance in many application areas. Such problems exhibit the potential for very significant fine-grain parallelism, with individual threads having short lifetimes while touching potentially “distant” memory objects in very unpredictable and irregular fashion. This is difficult for conventional distributed memory systems to achieve efficiently, but an alternative that combines cheap multi-threading with threads that can migrate freely through a large memory is a more natural fit. This paper demonstrates the potential of such an architecture by comparing its execution characteristics for a large graph to that of several conventional parallel implementations on modern but conventional architectures. The gains exhibited by the migrating threads are significant.
more » « less
Full Text Available
Understanding Performance Implications of LLM Inference on CPUs

https://doi.org/10.1109/IISWC63097.2024.00024

Na, Seonjin; Jeong, Geonhwa; Ahn, Byung Hoon; Young, Jeffrey; Krishna, Tushar; Kim, Hyesoon (September 2024, IEEE)

Full Text Available
CuPBoP: Making CUDA a Portable Language

https://doi.org/10.1145/3659949

Han, Ruobing; Chen, Jun; Garg, Bhanu; Zhou, Xule; Lu, John; Young, Jeffrey; Sim, Jaewoong; Kim, Hyesoon (July 2024, ACM Transactions on Design Automation of Electronic Systems)

CUDA is designed specifically for NVIDIA GPUs and is not compatible with non-NVIDIA devices. Enabling CUDA execution on alternative backends could greatly benefit the hardware community by fostering a more diverse software ecosystem. To address the need for portability, our objective is to develop a framework that meets key requirements, such as extensive coverage, comprehensive end-to-end support, superior performance, and hardware scalability. Existing solutions that translate CUDA source code into other high-level languages, however, fall short of these goals. In contrast to these source-to-source approaches, we present a novel framework, CuPBoP , which treats CUDA as a portable language in its own right. Compared to two commercial source-to-source solutions, CuPBoP offers a broader coverage and superior performance for the CUDA-to-CPU migration. Additionally, we evaluate the performance of CuPBoP against manually optimized CPU programs, highlighting the differences between CPU programs derived from CUDA and those that are manually optimized. Furthermore, we demonstrate the hardware scalability of CuPBoP by showcasing its successful migration of CUDA to AMD GPUs. To promote further research in this field, we have released CuPBoP as an open-source resource.
more » « less
Full Text Available
Comprex: In-Network Compression for Accelerating IoT Analytics at Scale

https://doi.org/10.1109/MM.2023.3343498

Oliveira, Rafael; Gavrilovska, Ada (March 2024, IEEE Micro)

Full Text Available
Multifidelity Memory System Simulation in SST

https://doi.org/10.1145/3631882.3631890

Lavin, Patrick; Young, Jeffrey; Vuduc, Richard (October 2023, ACM)

« Prev Next »

Search for: All records