NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms.

Qiu, H; Mao, W; Patke, A; Cui, S; Wang, C; Franke, H; Kalbarczyk, Z; Başar, T; Iyer, R (September 2024, MLSys)
Gibbons, PhillipB; Pekhimenko, Gennady; De_Sa, Christopher (Ed.)
The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.
more » « less
Full Text Available
FLASH: Fast Model Adaptation in ML-Centric Cloud Platforms.

Qiu, H; Mao, W; Patke, A; Cui, S; Wang, C; Franke, H; Kalbarczyk, Z; Başar, T; Iyer, R (September 2024, MLSys)
Gibbons, Phillip B; Gennady, P; De_Sa, Christopher (Ed.)
The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.
more » « less
Full Text Available
Power-aware Deep Learning Model Serving with µ-Serve. In Proceedings of the 2024 USENIX Annual Technical Conference (ATC 2024).

Qiu, H; Mao, W; Patke, A; Cui, S; Jha, S; Wang, C; Franke, H; Kalbarczyk, Z; Basar, T; Iyer, R (September 2024, Usenix_Atc_24)
Begnum, Kyrre; Border, Charles (Ed.)
With the increasing popularity of large deep learning model serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.
more » « less
Full Text Available
Power-aware Deep Learning Model Serving with µ-Serve

Qiu, H; Mao, W; Patke, A; Cui, S; Jha, S; Wang, C; Franke, H; Kalbarczyk, Z; Basar, T; Iyer, R (September 2024, Usenix_Atc_24)
Begnum, Kyrre; Border, Charles (Ed.)
With the increasing popularity of large deep learning model-serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine-grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model-serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.
more » « less
Full Text Available
Multi-Agent Meta-Reinforcement Learning: Sharper Convergence Rates with Task Similarity

Mao, W; Qiu, H; Wang, C; Franke, H; Kalbarczyk, Z; Iyer, R; Basar, T (April 2024, NeurIPS)
Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in isolation, while in practice the environment is often evolving, leaving many related tasks to be solved. In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. As an important intermediate step, we develop multiple MARL algorithms with initialization-dependent convergence guarantees. Such algorithms integrate optimistic policy mirror descents with stage-based value updates, and their refined convergence guarantees (nearly) recover the best known results even when a good initialization is unknown. To our best knowledge, such results are also new and might be of independent interest. We further provide numerical simulations to corroborate our theoretical findings.
more » « less
Full Text Available
Prediction Sets Adaptive to Unknown Covariate Shift

Qiu, H; Dobriban, E; Tchetgen_Tchetgen, E (November 2023, Journal of the Royal Statistical Society Series B Methodological)

Full Text Available
When Green Computing Meets Performance and Resilience SLOs.

https://doi.org/10.1109/DSN-S60304.2024

Qiu, H; Mao, W; Wang, C; Jha, S; Franke, H; Narayanaswami, C; Kalbarczyk, ZT; Basar, T; Iyer, R_K (January 2024, Institute of Electrical and Electronics Engineers)
nd (Ed.)
This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyperscale cloud datacenters, that consume significant power, thus producing increasing amounts of carbon emissions. Our goal is to (1) optimize the usage of green energy sources (e.g., solar energy), which is desirable but expensive and relatively unstable, and (2) continuously reduce the use of fossil fuels, which have a lower cost but a significant negative societal impact. Meanwhile, cloud datacenters strive to meet their customers’ requirements, e.g., service-level objectives (SLOs) in application latency or throughput, which are impacted by infrastructure resilience and availability. We propose a scalable formulation that combines sustainability, cloud resilience, and performance as a joint optimization problem with multiple interdependent objectives to address these issues holistically. Given the complexity and dynamicity of the problem, machine learning (ML) approaches, such as reinforcement learning, are essential for achieving continuous optimization. Our study highlights the challenges of green energy instability which necessitates innovative MLcentric solutions across heterogeneous infrastructures to manage the transition towards green computing. Underlying the MLcentric solutions must be methods to combine classic system resilience techniques with innovations in real-time ML resilience (not addressed heretofore). We believe that this approach will not only set a new direction in the resilient, SLO-driven adoption of green energy but also enable us to manage future sustainable systems in ways that were not possible before.
more » « less
Full Text Available
The impact of the FREDDA dedispersion algorithm on H 0 estimations with fast radio bursts

https://doi.org/10.1093/mnras/stae131

Hoffmann, J.; James, C. W.; Qiu, H.; Glowacki, M.; Bannister, K. W.; Gupta, V.; Prochaska, J. X.; Bera, A.; Deller, A. T.; Gourdji, K.; et al (January 2024, Monthly Notices of the Royal Astronomical Society)

ABSTRACT Fast radio bursts (FRBs) are transient radio signals of extragalactic origins that are subjected to propagation effects such as dispersion and scattering. It follows then that these signals hold information regarding the medium they have traversed and are hence useful as cosmological probes of the Universe. Recently, FRBs were used to make an independent measure of the Hubble constant H0, promising to resolve the Hubble tension given a sufficient number of detected FRBs. Such cosmological studies are dependent on FRB population statistics, cosmological parameters, and detection biases, and thus it is important to accurately characterize each of these. In this work, we empirically characterize the sensitivity of the Fast Real-time Engine for Dedispersing Amplitudes (FREDDA) which is the current detection system for the Australian Square Kilometre Array Pathfinder (ASKAP). We coherently redisperse high-time resolution data of 13 ASKAP-detected FRBs and inject them into FREDDA to determine the recovered signal-to-noise ratios as a function of dispersion measure. We find that for 11 of the 13 FRBs, these results are consistent with injecting idealized pulses. Approximating this sensitivity function with theoretical predictions results in a systematic error of 0.3 km s−1 Mpc−1 on H0 when it is the only free parameter. Allowing additional parameters to vary could increase this systematic by up to $$\sim 1\,$$ km s−1 Mpc−1. We estimate that this systematic will not be relevant until ∼400 localized FRBs have been detected, but will likely be significant in resolving the Hubble tension.
more » « less
A luminous fast radio burst that probes the Universe at redshift 1

https://doi.org/10.1126/science.adf2678

Ryder, S D; Bannister, K W; Bhandari, S; Deller, A T; Ekers, R D; Glowacki, M; Gordon, A C; Gourdji, K; James, C W; Kilpatrick, C D; et al (October 2023, Science)

Fast radio bursts (FRBs) are millisecond-duration pulses of radio emission originating from extragalactic distances. Radio dispersion is imparted on each burst by intervening plasma, mostly located in the intergalactic medium. In this work, we observe the burst FRB 20220610A and localize it to a morphologically complex host galaxy system at redshift 1.016 ± 0.002. The burst redshift and dispersion measure are consistent with passage through a substantial column of plasma in the intergalactic medium and extend the relationship between those quantities measured at lower redshift. The burst shows evidence for passage through additional turbulent magnetized plasma, potentially associated with the host galaxy. We use the burst energy of 2 × 10⁴²erg to revise the empirical maximum energy of an FRB.
more » « less
Full Text Available
A subarcsec localized fast radio burst with a significant host galaxy dispersion measure contribution

https://doi.org/10.1093/mnras/stad1839

Caleb, M.; Driessen, L. N.; Gordon, A. C.; Tejos, N.; Bernales, L.; Qiu, H.; Chibueze, J. O.; Stappers, B. W.; Rajwade, K. M.; Cavallaro, F.; et al (June 2023, Monthly Notices of the Royal Astronomical Society)

ABSTRACT We present the discovery of FRB 20210410D with the MeerKAT radio interferometer in South Africa, as part of the MeerTRAP commensal project. FRB 20210410D has a dispersion measure DM = 578.78 ± 2 $${\rm pc \, cm^{-3}}$$ and was localized to subarcsec precision in the 2 s images made from the correlation data products. The localization enabled the association of the FRB with an optical galaxy at z = 0.1415, which when combined with the DM places it above the 3σ scatter of the Macquart relation. We attribute the excess DM to the host galaxy after accounting for contributions from the Milky Way’s interstellar medium and halo, and the combined effects of the intergalactic medium and intervening galaxies. This is the first FRB that is not associated with a dwarf galaxy to exhibit a likely large host galaxy DM contribution. We do not detect any continuum radio emission at the FRB position or from the host galaxy down to a 3σ rms of 14.4 $$\mu$$Jy beam−1. The FRB has a scattering delay of $$29.4^{+2.8}_{-2.7}$$ ms at 1 GHz, and exhibits candidate subpulses in the spectrum, which hint at the possibility of it being a repeating FRB. Although not constraining, we note that this FRB has not been seen to repeat in 7.28 h at 1.3 GHz with MeerKAT, 3 h at 2.4 GHz with Murriyang, and 5.7 h at simultaneous 2.3 GHz and 8.4 GHz observations with the Deep Space Network. We encourage further follow-up to establish a possible repeating nature.
more » « less

« Prev Next »

Search for: All records