NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

RHODES: Robust Optimization for Uncertainty-Aware Design of CO2-Efficient Computing Systems

Elgamal, Mariam; Mahmoud, Abdulrahman; Wu, Carole-Jean; Wei, Gu-Yeon; Brooks, David; Hills, Gage (January 2026, International Symposium on Computer Architecture (ISCA) -- INITIAL SUBMISSION)

Free, publicly-accessible full text available January 7, 2027
CORDOBA: Carbon-Efficient Optimization Framework for Computing Systems

Elgamal, Mariam; Carmean, Doug; Ansari, Elnaz; Zed, Okay; Peri, Ramesh; Manne, Srilatha; Gupta, Udit; Wei, Gu-Yeon; Brooks, David; Hills, Gage; et al (March 2025, High Performance Computer Architecture (HPCA))

Free, publicly-accessible full text available March 1, 2026
CAMEL: Co-Designing AI Models and eDRAMs for Efficient On-Device Learning

https://doi.org/10.1109/HPCA57654.2024.00071

Zhang, Sai Qian; Tambe, Thierry; Cuevas, Nestor; Wei, Gu-Yeon; Brooks, David (March 2024, IEEE)
$S^3$: Increasing GPU Utilization during Generative Inference for Higher Throughput

Jin, Yunho; Wu, Chun-Feng; Brooks, David; Wei, Gu-Yeon (December 2023, Advances in neural information processing systems)

Full Text Available
Silent Data Corruption in Robot Operating System: A Case for End-to-End System-Level Fault Analysis Using Autonomous UAVs

https://doi.org/10.1109/TCAD.2023.3332293

Hsiao, Yu-Shun; Wan, Zishen; Jia, Tianyu; Ghosal, Radhika; Mahmoud, Abdulrahman; Raychowdhury, Arijit; Brooks, David; Wei, Gu-Yeon; Reddi, Vijay Janapa (April 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
MAVFI: An End-to-End Fault Analysis Framework with Anomaly Detection and Recovery for Micro Aerial Vehicles

Jia, Tianyu; Ghosal, Radhika; Mahmoud, Abdulrahman; Raychowdhury, Arijit; Brooks, David; Wei, Gu-Yeon; Reddi, Vijay Janapa (January 2023, Design Automation and Test in Europe (DATE))

Full Text Available
GoldenEye: A Platform for Evaluating Emerging Numerical Data Formats in DNN Accelerators

https://doi.org/10.1109/DSN53405.2022.00031

Mahmoud, Abdulrahman; Tambe, Thierry; Aloui, Tarek; Brooks, David; Wei, Gu-Yeon (June 2022, 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN))

Full Text Available
Trireme: Exploration of Hierarchical Multi-Level Parallelism for Hardware Acceleration

https://doi.org/10.1145/3580394

Zacharopoulos, Georgios; Ejjeh, Adel; Jing, Ying; Yang, En-Yu; Jia, Tianyu; Brumar, Iulian; Intan, Jeremy; Huzaifa, Muhammad; Adve, Sarita; Adve, Vikram; et al (January 2023, ACM Transactions on Embedded Computing Systems)

The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme , a fully automated tool-chain that explores multiple levels of parallelism and produces domain specific accelerator designs and configurations that maximize performance, given an area budget. FPGA SoCs were used as target platforms and Catapult HLS [7] was used to synthesize RTL using a commercial 12nm FinFET technology. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20 ×, as well as a speedup of up to 37 × for smaller applications, compared to software-only implementations.
more » « less
Full Text Available
22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management

https://doi.org/10.1109/ISSCC42615.2023.10067817

Tambe, Thierry; Zhang, Jeff; Hooper, Coleman; Jia, Tianyu; Whatmough, Paul N.; Zuckerman, Joseph; Santos, Maico Cassel; Loscalzo, Erik Jens; Giri, Davide; Shepard, Kenneth; et al (February 2023, 2023 IEEE International Solid- State Circuits Conference (ISSCC))

Large language models have substantially advanced nuance and context understanding in natural language processing (NLP), further fueling the growth of intelligent conversational interfaces and virtual assistants. However, their hefty computational and memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency and energy requirements. For example, an inference pass using the state-of-the-art BERT-base model must serially traverse through 12 computationally intensive transformer layers, each layer containing 12 parallel attention heads whose outputs concatenate to drive a large feed-forward network. To reduce computation latency, several algorithmic optimizations have been proposed, e.g., a recent algorithm dynamically matches linguistic complexity with model sizes via entropy-based early exit. Deploying such transformer models on edge platforms requires careful co-design and optimizations from algorithms to circuits, where energy consumption is a key design consideration.
more » « less
Full Text Available
CoopMC: Algorithm-Architecture Co-Optimization for Markov Chain Monte Carlo Accelerators

https://doi.org/10.1109/HPCA53966.2022.00012

Chai, Yuji; Ko, Glenn G.; Mark Ting, Wei-Te; Bailey, Luke; Brooks, David; Wei, Gu-Yeon (April 2022, IEEE International Symposium on High-Performance Computer Architecture (HPCA))

Full Text Available

« Prev Next »

Search for: All records