NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CHEF: A Framework for Deploying Heterogeneous Models on Clusters with Heterogeneous FPGAs

Tang, Yue; Song, Yukai; Elango, Naveena; Priya, Sheena R; Jones, Alex K; Xiong, Jinjun; Zhou, Peipei; Hu, Jingtong (October 2024, IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS)

Full Text Available
Personalized Meta-Federated Learning for IoT-Enabled Health Monitoring

https://doi.org/10.1109/TCAD.2024.3388908

Jia, Zhenge; Zhou, Tianren; Yan, Zheyu; Hu, Jingtong; Shi, Yiyu (October 2024, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

Qin, Ruiyang; Xia, Jun; Jia, Zhenge; Jiang, Meng; Abbasi, Ahmed; Zhou, Peipei; Hu, Jingtong; Shi, Yiyu (June 2024, ACM/IEEE)

After a large language model (LLM) is deployed on edge devices, it is desirable for these devices to learn from user-generated conversation data to generate user-specific and personalized responses in real-time. However, user-generated data usually contains sensitive and private information, and uploading such data to the cloud for annotation is not preferred if not prohibited. While it is possible to obtain annotation locally by directly asking users to provide preferred responses, such annotations have to be sparse to not affect user experience. In addition, the storage of edge devices is usually too limited to enable large-scale fine-tuning with full user-generated data. It remains an open question how to enable on-device LLM personalization, considering sparse annotation and limited on-device storage. In this paper, we propose a novel framework to select and store the most representative data online in a self-supervised way. Such data has a small memory footprint and allows infrequent requests of user annotations for further fine-tuning. To enhance fine-tuning quality, multiple semantically similar pairs of question texts and expected responses are generated using the LLM. Our experiments show that the proposed framework achieves the best user-specific content-generating capability (accuracy) and fine-tuning speed (performance) compared with vanilla baselines. To the best of our knowledge, this is the very first on-device LLM personalization framework.
more » « less
Full Text Available
Enabling On-Device Large Language Model Personalization with Self-Supervised Data Selection and Synthesis

https://doi.org/10.1145/3649329.3655665

Qin, Ruiyang; Xia, Jun; Jia, Zhenge; Jiang, Meng; Abbasi, Ahmed; Zhou, Peipei; Hu, Jingtong; Shi, Yiyu (June 2024, ACM)

Full Text Available
Synthetic Data Can Also Teach: Synthesizing Effective Data for Unsupervised Visual Representation Learning

https://doi.org/10.1609/aaai.v37i3.25388

Wu, Yawen; Wang, Zhepeng; Zeng, Dewen; Shi, Yiyu; Hu, Jingtong (June 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

Contrastive learning (CL), a self-supervised learning approach, can effectively learn visual representations from unlabeled data. Given the CL training data, generative models can be trained to generate synthetic data to supplement the real data. Using both synthetic and real data for CL training has the potential to improve the quality of learned representations. However, synthetic data usually has lower quality than real data, and using synthetic data may not improve CL compared with using real data. To tackle this problem, we propose a data generation framework with two methods to improve CL training by joint sample generation and contrastive learning. The first approach generates hard samples for the main model. The generator is jointly learned with the main model to dynamically customize hard samples based on the training state of the main model. Besides, a pair of data generators are proposed to generate similar but distinct samples as positive pairs. In joint learning, the hardness of a positive pair is progressively increased by decreasing their similarity. Experimental results on multiple datasets show superior accuracy and data efficiency of the proposed data generation methods applied to CL. For example, about 4.0%, 3.5%, and 2.6% accuracy improvements for linear classification are observed on ImageNet-100, CIFAR-100, and CIFAR-10, respectively. Besides, up to 2× data efficiency for linear classification and up to 5× data efficiency for transfer learning are achieved.
more » « less
Full Text Available
Self-supervised On-device Federated Learning from Unlabeled Streams

https://doi.org/10.1109/TCAD.2023.3274956

Shi, Jiahe; Wu, Yawen; Zeng, Dewen; Tao, Jun; Hu, Jingtong; Shi, Yiyu (January 2023, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
Low-power object-detection challenge on unmanned aerial vehicles

https://doi.org/10.1038/s42256-022-00567-4

Jia, Zhenge; Xu, Xiaowei; Hu, Jingtong; Shi, Yiyu (December 2022, Nature Machine Intelligence)

Full Text Available
Enabling Weakly Supervised Temporal Action Localization From On-Device Learning of the Video Stream

https://doi.org/10.1109/TCAD.2022.3197536

Tang, Yue; Wu, Yawen; Zhou, Peipei; Hu, Jingtong (November 2022, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems)

Full Text Available
EF-Train: Enable Efficient On-device CNN Training on FPGA through Data Reshaping for Online Adaptation or Personalization

https://doi.org/10.1145/3505633

Tang, Yue; Zhang, Xinyi; Zhou, Peipei; Hu, Jingtong (September 2022, ACM Transactions on Design Automation of Electronic Systems)

Conventionally, DNN models are trained once in the cloud and deployed in edge devices such as cars, robots, or unmanned aerial vehicles (UAVs) for real-time inference. However, there are many cases that require the models to adapt to new environments, domains, or users. In order to realize such domain adaption or personalization, the models on devices need to be continuously trained on the device. In this work, we design EF-Train, an efficient DNN training accelerator with a unified channel-level parallelism-based convolution kernel that can achieve end-to-end training on resource-limited low-power edge-level FPGAs. It is challenging to implement on-device training on resource-limited FPGAs due to the low efficiency caused by different memory access patterns among forward and backward propagation and weight update. Therefore, we developed a data reshaping approach with intra-tile continuous memory allocation and weight reuse. An analytical model is established to automatically schedule computation and memory resources to achieve high energy efficiency on edge FPGAs. The experimental results show that our design achieves 46.99 GFLOPS and 6.09 GFLOPS/W in terms of throughput and energy efficiency, respectively.
more » « less
Full Text Available
Decentralized Unsupervised Learning of Visual Representations

https://doi.org/10.24963/ijcai.2022/323

Wu, Y; Wang, Z.; Zeng, D.; Li, M.; Shi, Y.; Hu, J. (July 2022, International Joint Conferences on Artificial Intelligence)

Full Text Available

« Prev Next »

Search for: All records