NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Nitro: Boosting Distributed Reinforcement Learning with Serverless Computing

https://doi.org/10.14778/3696435.3696441

Yu, Hanfei; Carter, Jacob; Wang, Hao; Tiwari, Devesh; Li, Jian; Park, Seung-Jong (September 2025, Proceedings of the VLDB Endowment)

Deep reinforcement learning (DRL) has demonstrated significant potential in various applications, including gaming AI, robotics, and system scheduling. DRL algorithms produce, sample, and learn from training data online through a trial-and-error process, demanding considerable time and computational resources. To address this, distributed DRL algorithms and paradigms have been developed to expedite training using extensive resources. Through carefully designed experiments, we are the first to observe that strategically increasing the actor-environment interactions by spawning more concurrent actors at certain training rounds within ephemeral time frames can significantly enhance training efficiency. Yet, current distributed DRL solutions, which are predominantly server-based (or serverful), fail to capitalize on these opportunities due to their long startup times, limited adaptability, and cumbersome scalability. This paper proposesNitro, a generic training engine for distributed DRL algorithms that enforces timely and effective boosting with concurrent actors instantaneously spawned by serverless computing. With serverless functions,Nitroadjusts data sampling strategies dynamically according to the DRL training demands.Nitroseizes the opportunity of real-time boosting by accurately and swiftly detecting an empirical metric. To achieve cost efficiency, we design a heuristic actor scaling algorithm to guideNitrofor cost-aware boosting budget allocation. We integrateNitrowith state-of-the-art DRL algorithms and frameworks and evaluate them on AWS EC2 and Lambda. Experiments with Mujoco and Atari benchmarks show thatNitroimproves the final rewards (i.e., training quality) by up to 6× and reduces training costs by up to 42%.
more » « less
Full Text Available
AMPERE: A Generic Energy Estimation Approach for On-Device Training

https://doi.org/10.1145/3764944.3764951

Zhang, Jiaru; Wang, Zesong; Wang, Hao; Song, Tao; Su, Huai-an; Chen, Rui; Hua, Yang; Zhou, Xiangwei; Ma, Ruhui; Pan, Miao; et al (August 2025, ACM SIGMETRICS Performance Evaluation Review)

Battery-powered mobile devices (e.g., smartphones, AR/VR glasses, and various IoT devices) are increasingly being used for AI training due to their growing computational power and easy access to valuable, diverse, and real-time data. On-device training is highly energy-intensive, making accurate energy consumption estimation crucial for effective job scheduling and sustainable AI. However, the heterogeneity of devices and the complexity of models challenge the accuracy and generalizability of existing methods. This paper proposes AMPERE, a generic approach for energy consumption estimation in deep neural network (DNN) training. First, we examine the layer-wise energy additivity property of DNNs and strategically partition the entire model into layers for fine-grained energy consumption profiling. Then, we fit Gaussian Process (GP) models to learn from layer-wise energy consumption measurements and estimate a DNN's overall energy consumption based on its layer-wise energy additivity property. We conduct extensive experiments with various types of models across different real-world platforms. The results demonstrate that AMPERE has effectively reduced the Mean Absolute Percentage Error (MAPE) by up to 30%. Moreover, AMPERE is applied in guiding energy-aware pruning, successfully reducing energy consumption by 50%, thereby further demonstrating its generality and potential.
more » « less
Full Text Available
WHALE-FL: Wireless and Heterogeneity Aware Latency Efficient Federated Learning over Mobile Devices via Adaptive Subnetwork Scheduling

https://doi.org/10.1609/aaai.v39i19.34272

Su, Huai-An; Geng, Jiaxiang; Li, Liang; Qin, Xiaoqi; Hou, Yanzhao; Wang, Hao; Fu, Xin; Pan, Miao (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

As a popular distributed learning paradigm, federated learning (FL) over mobile devices fosters numerous applications, while their practical deployment is hindered by participating devices' computing and communication heterogeneity. Some pioneering research efforts proposed to extract subnetworks from the global model, and assign as large a subnetwork as possible to the device for local training based on its full computing capacity. Although such fixed size subnetwork assignment enables FL training over heterogeneous mobile devices, it is unaware of (i) the dynamic changes of devices' communication and computing conditions and (ii) FL training progress and its dynamic requirements of local training contributions, both of which may cause very long FL training delay. Motivated by those dynamics, in this paper, we develop a wireless and heterogeneity aware latency efficient FL (WHALE-FL) approach to accelerate FL training through adaptive subnetwork scheduling. Instead of sticking to the fixed size subnetwork, WHALE-FL introduces a novel subnetwork selection utility function to capture device and FL training dynamics, and guides the mobile device to adaptively select the subnetwork size for local training based on (a) its computing and communication capacity, (b) its dynamic computing and/or communication conditions, and (c) FL training status and its corresponding requirements for local training contributions. Our evaluation shows that, compared with peer designs, WHALE-FL effectively accelerates FL training without sacrificing learning accuracy.
more » « less
Full Text Available
pFedGPT: Hierarchically Optimizing LoRA Aggregation Weights for Personalized Federated GPT Models

https://doi.org/10.18653/v1/2025.emnlp-main.239

Shen, Zhanming; Xu, Tianqi; Wang, Hao; Li, Jian; Pan, Miao (January 2025, Association for Computational Linguistics)

Full Text Available
Temporal Contrastive Learning for Sensor-Based Human Activity Recognition: A Self-Supervised Approach

https://doi.org/10.1109/jsen.2024.3491933

Chen, Xiaobing; Zhou, Xiangwei; Sun, Mingxuan; Wang, Hao (January 2025, IEEE Sensors Journal)

Full Text Available
Pre-Warming is Not Enough: Accelerating Serverless Inference With Opportunistic Pre-Loading

https://doi.org/10.1145/3698038.3698509

Sui, Yifan; Yu, Hanfei; Hu, Yitao; Li, Jianxun; Wang, Hao (November 2024, ACM)

Full Text Available
Stellaris: Staleness-Aware Distributed Reinforcement Learning with Serverless Computing

https://doi.org/10.1109/SC41406.2024.00045

Yu, Hanfei; Wang, Hao; Tiwari, Devesh; Li, Jian; Park, Seung-Jong (November 2024, IEEE)

Full Text Available
Freyr+: Harvesting Idle Resources in Serverless Computing via Deep Reinforcement Learning

https://doi.org/10.1109/TPDS.2024.3462294

Yu, Hanfei; Wang, Hao; Li, Jian; Yuan, Xu; Park, Seung-Jong (November 2024, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available

Search for: All records