NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation

Xue, Qiyao; Yin, Xiangyu; Yang, Boyuan; Gao, Wei (June 2025, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025.)

Text-to-video (T2V) generation has been recently enabled by transformer-based diffusion models, but current T2V models lack capabilities in adhering to the real-world common knowledge and physical rules, due to their limited understanding of physical realism and deficiency in temporal modeling. Existing solutions are either data-driven or require extra model inputs, but cannot be generalizable to out-of-distribution domains. In this paper, we present PhyT2V, a new data-independent T2V technique that expands the current T2V model’s capability of video generation to out-of-distribution domains, by enabling chain-of-thought and step-back reasoning in T2V prompting. Our experiments show that PhyT2V improves existing T2V models’ adherence to real-world physical rules by 2.3x, and achieves 35% improvement compared to T2V prompt enhancers.
more » « less
Free, publicly-accessible full text available June 11, 2026
Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness

https://doi.org/10.1609/aaai.v39i20.35405

Wang, Haoming; Gao, Wei (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Federated Learning (FL) can be affected by data and device heterogeneities, caused by clients' different local data distributions and latencies in uploading model updates (i.e., staleness). Traditional schemes consider these heterogeneities as two separate and independent aspects, but this assumption is unrealistic in practical FL scenarios where these heterogeneities are intertwined. In these cases, traditional FL schemes are ineffective, and a better approach is to convert a stale model update into a unstale one. In this paper, we present a new FL framework that ensures the accuracy and computational efficiency of this conversion, hence effectively tackling the intertwined heterogeneities that may cause unlimited staleness in model updates. Our basic idea is to estimate the distributions of clients' local training data from their uploaded stale model updates, and use these estimations to compute unstale client model updates. In this way, our approach does not require any auxiliary dataset nor the clients' local models to be fully trained, and does not incur any additional computation or communication overhead at client devices. We compared our approach with the existing FL strategies on mainstream datasets and models, and showed that our approach can improve the trained model accuracy by up to 25% and reduce the number of required training epochs by up to 35%.
more » « less
Free, publicly-accessible full text available April 11, 2026
Perceptual-Centric Image Super-Resolution using Heterogeneous Processors on Mobile Devices

https://doi.org/10.1145/3636534.3690698

Huang, Kai; Yin, Xiangyu; Gu, Tao; Gao, Wei (December 2024, in Proceedings of the 30th ACM International Conference on Mobile Computing and Networking (MobiCom), 2024.)

Image super-resolution (SR) is widely used on mobile devices to enhance user experience. However, neural networks used for SR are computationally expensive, posing challenges for mobile devices with limited computing power. A viable solution is to use heterogeneous processors on mobile devices, especially the specialized hardware AI accelerators, for SR computations, but the reduced arithmetic precision on AI accelerators can lead to degraded perceptual quality in upscaled images. To address this limitation, in this paper we present SR For Your Eyes (FYE-SR), a novel image SR technique that enhances the perceptual quality of upscaled images when using heterogeneous processors for SR computations. FYESR strategically splits the SR model and dispatches different layers to heterogeneous processors, to meet the time constraint of SR computations while minimizing the impact of AI accelerators on image quality. Experiment results show that FYE-SR outperforms the best baselines, improving perceptual image quality by up to 2x, or reducing SR computing latency by up to 5.6x with on-par image quality.
more » « less
Free, publicly-accessible full text available December 4, 2025
Towards Green AI in Fine-Tuning Large Language Models via Adaptive Backpropagation

Huang, Kai; Yin, Hanyun; Huang, Heng; Gao, Wei (May 2024, in Proceedings of the 12th International Conference on Learning Representations (ICLR), 2024.)

Fine-tuning is essential to adapting pre-trained large language models to downstream applications. With the increasing popularity of LLM-enabled applications, fine-tuning has been performed intensively worldwide, incurring a tremendous amount of computing costs that correspond to big carbon footprint and environmental impact. Mitigating such environmental impact directly correlates to reducing the fine-tuning FLOPs. Existing fine-tuning schemes focus on either saving memory or reducing the overhead of computing weight updates, but cannot achieve sufficient FLOPs reduction due to their ignorance of the training cost in backpropagation. To address this limitation, in this paper we present GreenTrainer, a new technique that minimizes the FLOPs of LLM fine-tuning via adaptive backpropagation, which adaptively selects the most appropriate set of LLM tensors for fine-tuning based on their importance and backpropagation cost in training. Experiment results show that GreenTrainer can save up to 64% training FLOPs compared to full fine-tuning, without any noticeable accuracy loss. Compared to the existing schemes such as Prefix Tuning and LoRA, GreenTrainer can achieve up to 4% improvement of model accuracy, with on-par FLOPs reduction.
more » « less
Full Text Available
ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection

https://doi.org/10.1145/3581791.3596852

Huang, Kai; Yang, Boyuan; Gao, Wei (July 2023, ACM)

Full Text Available
PTEase: Objective Airway Examination for Pulmonary Telemedicine using Commodity Smartphones

https://doi.org/10.1145/3581791.3596854

Yin, Xiangyu; Huang, Kai; Forno, Erick; Chen, Wei; Huang, Heng; Gao, Wei (June 2023, Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services)

Remote monitoring and evaluation of pulmonary diseases via telemedicine are important to disease diagnosis and management, but current telemedicine solutions have limited capability of objectively examining the airway's internal physiological conditions that are crucial to pulmonary disease evaluation. Existing solutions based on smartphone sensing are also limited to externally monitoring breath rates, respiratory events, or lung function. In this paper, we present PTEase, a new system design that addresses these limitations and uses commodity smartphones to examine the airway's internal physiological conditions. PTEase uses active acoustic sensing to measure the internal changes of lower airway caliber, and then leverages machine learning to analyze the sensory data for pulmonary disease evaluation. We implemented PTEase as a smartphone app, and verified its measurement error in lab-controlled settings as <10%. Clinical studies further showed that PTEase reaches 75% accuracy on disease prediction and 11%-15% errors in estimating lung function indices. Given that such accuracy is comparable with that in clinical practice using spirometry, PTEase can be reliably used as an assistive telemedicine tool for disease evaluation and monitoring.
more » « less
Full Text Available
Out-Clinic Pulmonary Disease Evaluation via Acoustic Sensing and Multi-Task Learning on Commodity Smartphones

https://doi.org/10.1145/3560905.3568437

Yin, Xiangyu; Huang, Kai; Forno, Erick; Chen, Wei; Huang, Heng; Gao, Wei (November 2022, Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems)

Pulmonary diseases, such as asthma and Chronic Obstructive Pulmonary Disease (COPD), constitute a major public health challenge. The disease symptoms, including airway obstruction and inflammation, usually result in changes in airway mechanical properties, such as the caliber and impedance of the airway. To measure such airway properties for disease evaluation and diagnosis purposes, pulmonary function tests (PFT) has been widely adopted. However, most existing PFT systems require expensive and cumbersome hardware that are impossible to be used out of clinic. To allow out-clinic continuous pulmonary disease evaluation, in this paper we present AWARE, a new sensing and AI system that supports accurate and reliable PFT using commodity smartphones. AWARE uses a smartphone to transmit acoustic signals and reconstructs the profile of human airway based on the analysis of reflected acoustic waves captured from the smartphone's microphone. The subject's pulmonary condition is then evaluated by a multi-task learning model that integrates both the airway measurements and the subject's lung function records as the ground truth. Evaluations on 75 human subjects demonstrate that AWARE has the capability to achieve 80% accuracy on distinguishing between humans with healthy pulmonary function and with asthma symptoms.
more » « less
Full Text Available
AiFi: AI-Enabled WiFi Interference Cancellation with Commodity PHY-Layer Information

https://doi.org/10.1145/3560905.3568537

Chen, Ruirong; Huang, Kai; Gao, Wei (November 2022, Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems)

Full Text Available
Real-time neural network inference on extremely weak devices: agile offloading with explainable AI

https://doi.org/10.1145/3495243.3560551

Huang, Kai; Gao, Wei (October 2022, Proceedings of the 28th Annual International Conference on Mobile Computing And Networking)

Full Text Available

Search for: All records