NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Characterizing Soft-Error Resiliency in Arm's Ethos-U55 Embedded Machine Learning Accelerator

https://doi.org/10.1109/ISPASS61541.2024.00019

Tyagi, Abhishek; Jeyapaul, Reiley; Zhou, Chuteng; Whatmough, Paul; Zhu, Yuhao (May 2024, IEEE)

Full Text Available
AR-PIM: An Adaptive-Range Processing-in-Memory Architecture

https://doi.org/10.1109/ISLPED58423.2023.10244186

Chou, Teyuh; Garcia-Redondo, Fernando; Whatmough, Paul; Zhang, Zhengya (August 2023, IEEE)

Full Text Available
Thales: Formulating and Estimating Architectural Vulnerability Factors for DNN Accelerators

Tyagi, Abhishek; Gan, Yiming; Liu, Shaoshan; Yu, Bo; Whatmough, Paul; Zhu, Yuhao (March 2023, IEEE International Symposium on High-Performance Computer Architecture)

Full Text Available
22.9 A 12nm 18.1TFLOPs/W Sparse Transformer Processor with Entropy-Based Early Exit, Mixed-Precision Predication and Fine-Grained Power Management

https://doi.org/10.1109/ISSCC42615.2023.10067817

Tambe, Thierry; Zhang, Jeff; Hooper, Coleman; Jia, Tianyu; Whatmough, Paul N.; Zuckerman, Joseph; Santos, Maico Cassel; Loscalzo, Erik Jens; Giri, Davide; Shepard, Kenneth; et al (February 2023, 2023 IEEE International Solid- State Circuits Conference (ISSCC))

Large language models have substantially advanced nuance and context understanding in natural language processing (NLP), further fueling the growth of intelligent conversational interfaces and virtual assistants. However, their hefty computational and memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency and energy requirements. For example, an inference pass using the state-of-the-art BERT-base model must serially traverse through 12 computationally intensive transformer layers, each layer containing 12 parallel attention heads whose outputs concatenate to drive a large feed-forward network. To reduce computation latency, several algorithmic optimizations have been proposed, e.g., a recent algorithm dynamically matches linguistic complexity with model sizes via entropy-based early exit. Deploying such transformer models on edge platforms requires careful co-design and optimizations from algorithms to circuits, where energy consumption is a key design consideration.
more » « less
Full Text Available
SMIV: A 16-nm 25-mm² SoC for IoT With Arm Cortex-A53, eFPGA, and Coherent Accelerators

https://doi.org/10.1109/JSSC.2021.3115466

Lee, Sae Kyu; Whatmough, Paul N.; Donato, Marco; Ko, Glenn G.; Brooks, David; Wei, Gu-Yeon (February 2022, IEEE Journal of Solid-State Circuits)

Full Text Available
A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs

https://doi.org/10.1109/JSSC.2022.3179303

Tambe, Thierry; Yang, En-Yu; Ko, Glenn G.; Chai, Yuji; Hooper, Coleman; Donato, Marco; Whatmough, Paul N.; Rush, Alexander M.; Brooks, David; Wei, Gu-Yeon (June 2022, IEEE Journal of Solid-State Circuits)

Full Text Available
FixyFPGA: Efficient FPGA Accelerator for Deep Neural Networks with High Element-Wise Sparsity and without External Memory Access

https://doi.org/10.1109/FPL53798.2021.00010

Meng, Jian; Venkataramanaiah, Shreyas Kolala; Zhou, Chuteng; Hansen, Patrick; Whatmough, Paul; Seo, Jae-sun (August 2021, 2021 31st International Conference on Field-Programmable Logic and Applications (FPL))

Full Text Available
9.8 A 25mm ² SoC for IoT Devices with 18ms Noise-Robust Speech-to-Text Latency via Bayesian Speech Denoising and Attention-Based Sequence-to-Sequence DNN Speech Recognition in 16nm FinFET

https://doi.org/10.1109/ISSCC42613.2021.9366062

Tambe, Thierry; Yang, En-Yu; Ko, Glenn G.; Chai, Yuji; Hooper, Coleman; Donato, Marco; Whatmough, Paul N.; Rush, Alexander M.; Brooks, David; Wei, Gu-Yeon (February 2021, International Solid-State Circuits Conference)

Full Text Available
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference

https://doi.org/10.1145/3466752.3480095

Tambe, Thierry; Hooper, Coleman; Pentecost, Lillian; Jia, Tianyu; Yang, En-Yu; Donato, Marco; Sanh, Victor; Whatmough, Paul; Rush, Alexander M.; Brooks, David; et al (October 2021, MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture)

Full Text Available
A 16nm 25mm ² SoC with a 54.5x Flexibility-Efficiency Range from Dual-Core Arm Cortex-A53 to eFPGA and Cache-Coherent Accelerators

https://doi.org/10.23919/VLSIC.2019.8778002

Whatmough, Paul N.; Lee, Sae Kyu; Donato, Marco; Hsueh, Hsea-Ching; Xi, Sam Likun; Gupta, Udit; Pentecost, Lillian; Ko, Glenn G.; Brooks, David; Wei, Gu-Yeon (June 2019, 2019 Symposium on VLSI Circuits)

Full Text Available

Search for: All records