- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources2
- Resource Type
-
0002000000000000
- More
- Availability
-
11
- Author / Contributor
- Filter by Author / Creator
-
-
Giri, Davide (2)
-
Zuckerman, Joseph (2)
-
Adve, Sarita V (1)
-
Block, Charles (1)
-
Brooks, David (1)
-
Carloni, Luca (1)
-
Carloni, Luca P (1)
-
Hooper, Coleman (1)
-
Jia, Tianyu (1)
-
Jin, Naiyin (1)
-
Jing, Ying (1)
-
Loscalzo, Erik Jens (1)
-
Mantovani, Paolo (1)
-
Mishra, Bakshree (1)
-
Rush, Alexander (1)
-
Santos, Maico Cassel (1)
-
Shepard, Kenneth (1)
-
Suresh, Vignesh (1)
-
Tambe, Thierry (1)
-
Wei, Gu-Yeon (1)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available October 13, 2025
-
Tambe, Thierry; Zhang, Jeff; Hooper, Coleman; Jia, Tianyu; Whatmough, Paul N.; Zuckerman, Joseph; Santos, Maico Cassel; Loscalzo, Erik Jens; Giri, Davide; Shepard, Kenneth; et al (, 2023 IEEE International Solid- State Circuits Conference (ISSCC))Large language models have substantially advanced nuance and context understanding in natural language processing (NLP), further fueling the growth of intelligent conversational interfaces and virtual assistants. However, their hefty computational and memory demands make them potentially expensive to deploy on cloudless edge platforms with strict latency and energy requirements. For example, an inference pass using the state-of-the-art BERT-base model must serially traverse through 12 computationally intensive transformer layers, each layer containing 12 parallel attention heads whose outputs concatenate to drive a large feed-forward network. To reduce computation latency, several algorithmic optimizations have been proposed, e.g., a recent algorithm dynamically matches linguistic complexity with model sizes via entropy-based early exit. Deploying such transformer models on edge platforms requires careful co-design and optimizations from algorithms to circuits, where energy consumption is a key design consideration.more » « less