NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration

Yu, Zhongzhi; Wang, Zheng; Fu, Yonggan; Shi, Huihong; Shaikh, Khalid; Lin, Yingyan Celine (July 2024, Proceedings of Machine Learning Research)

Full Text Available
INVITED: Data4AIGChip: An Automated Data Generation and Validation Flow for LLM-assisted Hardware Design

Zhang, Yongan; Fu, Yonggan; Yu, Zhongzhi; Zhao, Kevin; Wan, Cheng; Li, Chaojian; Lin, Yingyan Celine (June 2024, ACM)

Full Text Available
ML-Based Feedback-Free Adaptive MCS Selection for Massive Multi-User MIMO

https://doi.org/10.1109/IEEECONF59524.2023.10476866

An, Qing; Zafari, Mehdi; Dick, Chris; Segarra, Santiago; Sabharwal, Ashutosh; Doost-Mohammady, Rahman (October 2023, IEEE)

Full Text Available
On the Design of Reconfigurable Edge Devices for RF Fingerprint Identification (RED-RFFI) for IoT Systems

https://doi.org/10.1109/IEEECONF59524.2023.10476864

Keller, Thomas; Cavallaro, Joseph R (October 2023, IEEE)

Full Text Available
Enabling Resilience in Virtualized RANs with Atlas

https://doi.org/10.1145/3570361.3613276

Xing, Jiarong; Gong, Junzhi; Foukas, Xenofon; Kalia, Anuj; Kim, Daehyeok; Kotaru, Manikanta (October 2023, ACM)

Full Text Available
A Unified Parallel CORDIC-Based Hardware Architecture for LSTM Network Acceleration

https://doi.org/10.1109/TC.2023.3268400

Mohamed, Nadya A.; Cavallaro, Joseph R. (October 2023, IEEE Transactions on Computers)

Full Text Available
Design and Implementation of an FPGA-Based DNN Architecture for Real-time Outlier Detection

https://doi.org/10.1007/s11265-023-01835-1

Mohamed, Nadya; Cavallaro, Joseph (July 2023, Journal of Signal Processing Systems)

Full Text Available
A Deep Reinforcement Learning-Based Resource Scheduler for Massive MIMO Networks

https://doi.org/10.1109/TMLCN.2023.3313988

An, Qing; Segarra, Santiago; Dick, Chris; Sabharwal, Ashutosh; Doost-Mohammady, Rahman (January 2023, IEEE Transactions on Machine Learning in Communications and Networking)
Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

Yonggan Fu; Yang Zhang; Kaizhi Qian; Zhifan Ye; Zhongzhi Yu; Cheng-I Lai; Yingyan Lin (November 2022, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022))

Self-supervised learning (SSL) for rich speech representations has achieved empirical success in low-resource Automatic Speech Recognition (ASR) and other speech processing tasks, which can mitigate the necessity of a large amount of transcribed speech and thus has driven a growing demand for on-device ASR and other speech processing. However, advanced speech SSL models have become increasingly large, which contradicts the limited on-device resources. This gap could be more severe in multilingual/multitask scenarios requiring simultaneously recognizing multiple languages or executing multiple speech processing tasks. Additionally, strongly overparameterized speech SSL models tend to suffer from overfitting when being finetuned on low-resource speech corpus. This work aims to enhance the practical usage of speech SSL models towards a win-win in both enhanced efficiency and alleviated overfitting via our proposed S-Router framework, which for the first time discovers that simply discarding no more than 10% of model weights via only finetuning model connections of speech SSL models can achieve better accuracy over standard weight finetuning on downstream speech processing tasks. More importantly, S-Router can serve as an all-in-one technique to enable (1) a new finetuning scheme, (2) an efficient multilingual/multitask solution, (3) a state-of-the-art pruning technique, and (4) a new tool to quantitatively analyze the learned speech representation.
more » « less
Full Text Available
RT-NeRF: Real-Time On-Device Neural Radiance Fields Towards Immersive AR/VR Rendering

https://doi.org/10.1145/3508352.3549380

Chaojian Li; Sixu Li; Yang Zhao; Wenbo Zhu; Yingyan Lin (October 2022, 2022 IEEE/ACM International Conference on Computer-Aided Design)

Neural Radiance Field (NeRF) based rendering has attracted growing attention thanks to its state-of-the-art (SOTA) rendering quality and wide applications in Augmented and Virtual Reality (AR/VR). However, immersive real-time (> 30 FPS) NeRF based rendering enabled interactions are still limited due to the low achievable throughput on AR/VR devices. To this end, we first profile SOTA efficient NeRF al- gorithms on commercial devices and identify two primary causes of the aforementioned inefficiency: (1) the uniform point sampling and (2) the dense accesses and computations of the required embeddings in NeRF. Furthermore, we propose RT-NeRF, which to the best of our knowledge is the first algorithm-hardware co-design acceleration of NeRF. Specifically, on the algorithm level, RT-NeRF integrates an efficient rendering pipeline for largely alleviating the inefficiency due to the commonly adopted uniform point sampling method in NeRF by directly computing the geometry of pre-existing points. Additionally, RT-NeRF leverages a coarse-grained view-dependent computing ordering scheme for eliminating the (unnecessary) pro- cessing of invisible points. On the hardware level, our proposed RT-NeRF accelerator (1) adopts a hybrid encoding scheme to adap- tively switch between a bitmap- or coordinate-based sparsity encoding format for NeRF’s sparse embeddings, aiming to maximize the storage savings and thus reduce the required DRAM accesses while supporting efficient NeRF decoding; and (2) integrates both a high-density sparse search unit and a dual-purpose bi-direction adder & search tree to coordinate the two aforementioned encod- ing formats. Extensive experiments on eight datasets consistently validate the effectiveness of RT-NeRF, achieving a large throughput improvement (e.g., 9.7×∼3,201×) while maintaining the rendering quality as compared with SOTA efficient NeRF solutions.
more » « less
Full Text Available

« Prev Next »

Search for: All records