NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ATAPP: Architecture and Technology Aware Power Predictor for Unseen FPGAs

Wei, Zhigang; Arora, Aman; Shriver, Emily; John, Lizy (September 2025, IEEE Digital Library)

Free, publicly-accessible full text available September 2, 2026
CarbonSet: A Dataset to Analyze Trends and Benchmark the Sustainability of CPUs and GPUs

https://doi.org/10.1145/3716368.3735235

Hu, Jiajun; Choppali_Sudarshan, Chetan; Clifford, Maxwell; Chhabria, Vidya; Arora, Aman (June 2025, ACM)

Free, publicly-accessible full text available June 29, 2026
Beyond the Surface: The Necessity for Detailed Metrics in Corporate Sustainability Reports

https://doi.org/10.1109/IGSC64514.2024.00035

Sudarshan, Chetan Choppali; Arora, Aman; Chhabria, Vidya A (November 2024, IEEE)

Full Text Available
Systolic Sparse Tensor Slices: FPGA Building Blocks for Sparse and Dense AI Acceleration

https://doi.org/10.1145/3706628.3708867

Taka, Endri; Huang, Ning-Chi; Chang, Chi-Chih; Wu, Kai-Chiang; Arora, Aman; Marculescu, Diana (February 2025, ACM)

Free, publicly-accessible full text available February 27, 2026
GreenFPGA: Evaluating FPGAs as Environmentally Sustainable Computing Solutions

Sudarshan, Choppalli Chetan; Arora, Aman; Chhabria, Vidya A (June 2024, ACM)

Full Text Available
Efficient Approaches for GEMM Acceleration on Leading AI-Optimized FPGAs

https://doi.org/10.1109/FCCM60383.2024.00015

Taka, Endri; Gourounas, Dimitrios; Gerstlauer, Andreas; Marculescu, Diana; Arora, Aman (May 2024, IEEE)

Full Text Available
LogicNets vs. ULEEN : Comparing two novel high throughput edge ML inference techniques on FPGA

Nag, Shashank; Susskind, Zachary; Arora, Aman; Bacellar, Alan_T_L; Dutra, Diego_L_C; Miranda, Igor_D_S; Kailas, Krishnan; John, Eugene_B; Breternitz, Mauricio; Lima, Priscila_M_V; et al (August 2024, IEEE Digital Library)

Full Text Available
MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine

https://doi.org/10.1109/ICFPT59805.2023.00016

Taka, Endri; Arora, Aman; Wu, Kai-Chiang; Marculescu, Diana (December 2023, IEEE)

Full Text Available
HLSDataset: Open-Source Dataset for ML-Assisted FPGA Design using High Level Synthesis

https://doi.org/10.1109/ASAP57973.2023.00040

Wei, Zhigang; Arora, Aman; Li, Ruihao; John, Lizy (July 2023, International Conference on Application-specific Systems, Architectures and Processors)

Full Text Available
Infinity Stream: Portable and Programmer-Friendly In-/Near-Memory Fusion

https://doi.org/10.1145/3582016.3582032

Wang, Zhengrong; Liu, Christopher; Arora, Aman; John, Lizy; Nowatzki, Tony (March 2023, ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

In-memory computing with large last-level caches is promising to dramatically alleviate data movement bottlenecks and expose massive bitline-level parallelization opportunities. However, key challenges from its unique execution model remain unsolved: automated parallelization, transparently orchestrating data transposition/alignment/broadcast for bit-serial logic, and mixing in-/near-memory computing. Most importantly, the solution should be programmer friendly and portable across platforms. Our key innovation is an execution model and intermediate representation (IR) that enables hybrid CPU-core, in-memory, and near-memory processing. Our IR is the tensor dataflow graph (tDFG), which is a unified representation of in-memory and near-memory computation. The tDFG exposes tensor-data structure information so that the hardware and runtime can automatically orchestrate data management for bitserial execution, including runtime data layout transformations. To enable microarchitecture portability, we use a two-phase, JIT-based compilation approach to dynamically lower the tDFG to in-memory commands. Our design, infinity stream, is evaluated on a cycle-accurate simulator. Across data-processing workloads with fp32, it achieves 2.6× speedup and 75% traffic reduction over a state-of-the-art near-memory computing technique, with 2.4× energy efficiency.
more » « less
Full Text Available

« Prev Next »

Search for: All records