skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Qiu, H"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Gibbons, PhillipB; Pekhimenko, Gennady; De_Sa, Christopher (Ed.)
    The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production. 
    more » « less
    Free, publicly-accessible full text available September 1, 2025
  2. Gibbons, Phillip B; Gennady, P; De_Sa, Christopher (Ed.)
    The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production. 
    more » « less
    Free, publicly-accessible full text available September 1, 2025
  3. Begnum, Kyrre; Border, Charles (Ed.)
    With the increasing popularity of large deep learning model serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations. 
    more » « less
    Free, publicly-accessible full text available September 1, 2025
  4. Begnum, Kyrre; Border, Charles (Ed.)
    With the increasing popularity of large deep learning model-serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine-grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model-serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations. 
    more » « less
    Free, publicly-accessible full text available September 1, 2025
  5. The development of lithium-ion battery technology has ensured that battery thermal management systems are an essential component of the battery pack for next-generation energy storage systems. Using dielectric immersion cooling, researchers have demonstrated the ability to attain high heat transfer rates due to the direct contact between cells and the coolant. However, feedback control has not been widely applied to immersion cooling schemes. Furthermore, current research has not considered battery pack plant design when optimizing feedback control. Uncertainties are inherent in the cooling equipment, resulting in temperature and flow rate fluctuations. Hence, it is crucial to systematically consider these uncertainties during cooling system design to improve the performance and reliability of the battery pack. To fill this gap, we established a reliability-based control co-design optimization framework using machine learning for immersion cooled battery packs. We first developed an experimental setup for 21700 battery immersion cooling, and the experiment data were used to build a high-fidelity multiphysics finite element model. The model can precisely represent the electrical and thermal profile of the battery. We then developed surrogate models based on the finite element simulations in order to reduce computational cost. The reliability-based control co-design optimization was employed to find the best plant and control design for the cooling system, in which an outer optimization loop minimized the cooling system cost while an inner loop ensured battery pack reliability. Finally, an optimal cooling system design was obtained and validated, which showed a 90% saving in cooling system energy consumption. 
    more » « less
    Free, publicly-accessible full text available July 14, 2025
  6. Oh, A; Naumann, T; Globerson, A; Saenko, K; Hardt, M; Levine, S (Ed.)
    Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in isolation, while in practice the environment is often evolving, leaving many related tasks to be solved. In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. As an important intermediate step, we develop multiple MARL algorithms with initialization-dependent convergence guarantees. Such algorithms integrate optimistic policy mirror descents with stage-based value updates, and their refined convergence guarantees (nearly) recover the best known results even when a good initialization is unknown. To our best knowledge, such results are also new and might be of independent interest. We further provide numerical simulations to corroborate our theoretical findings. 
    more » « less
  7. nd (Ed.)
    This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyperscale cloud datacenters, that consume significant power, thus producing increasing amounts of carbon emissions. Our goal is to (1) optimize the usage of green energy sources (e.g., solar energy), which is desirable but expensive and relatively unstable, and (2) continuously reduce the use of fossil fuels, which have a lower cost but a significant negative societal impact. Meanwhile, cloud datacenters strive to meet their customers’ requirements, e.g., service-level objectives (SLOs) in application latency or throughput, which are impacted by infrastructure resilience and availability. We propose a scalable formulation that combines sustainability, cloud resilience, and performance as a joint optimization problem with multiple interdependent objectives to address these issues holistically. Given the complexity and dynamicity of the problem, machine learning (ML) approaches, such as reinforcement learning, are essential for achieving continuous optimization. Our study highlights the challenges of green energy instability which necessitates innovative MLcentric solutions across heterogeneous infrastructures to manage the transition towards green computing. Underlying the MLcentric solutions must be methods to combine classic system resilience techniques with innovations in real-time ML resilience (not addressed heretofore). We believe that this approach will not only set a new direction in the resilient, SLO-driven adoption of green energy but also enable us to manage future sustainable systems in ways that were not possible before. 
    more » « less
  8. ABSTRACT Fast radio bursts (FRBs) are transient radio signals of extragalactic origins that are subjected to propagation effects such as dispersion and scattering. It follows then that these signals hold information regarding the medium they have traversed and are hence useful as cosmological probes of the Universe. Recently, FRBs were used to make an independent measure of the Hubble constant H0, promising to resolve the Hubble tension given a sufficient number of detected FRBs. Such cosmological studies are dependent on FRB population statistics, cosmological parameters, and detection biases, and thus it is important to accurately characterize each of these. In this work, we empirically characterize the sensitivity of the Fast Real-time Engine for Dedispersing Amplitudes (FREDDA) which is the current detection system for the Australian Square Kilometre Array Pathfinder (ASKAP). We coherently redisperse high-time resolution data of 13 ASKAP-detected FRBs and inject them into FREDDA to determine the recovered signal-to-noise ratios as a function of dispersion measure. We find that for 11 of the 13 FRBs, these results are consistent with injecting idealized pulses. Approximating this sensitivity function with theoretical predictions results in a systematic error of 0.3 km s−1 Mpc−1 on H0 when it is the only free parameter. Allowing additional parameters to vary could increase this systematic by up to $$\sim 1\,$$ km s−1 Mpc−1. We estimate that this systematic will not be relevant until ∼400 localized FRBs have been detected, but will likely be significant in resolving the Hubble tension. 
    more » « less
  9. Fast radio bursts (FRBs) are millisecond-duration pulses of radio emission originating from extragalactic distances. Radio dispersion is imparted on each burst by intervening plasma, mostly located in the intergalactic medium. In this work, we observe the burst FRB 20220610A and localize it to a morphologically complex host galaxy system at redshift 1.016 ± 0.002. The burst redshift and dispersion measure are consistent with passage through a substantial column of plasma in the intergalactic medium and extend the relationship between those quantities measured at lower redshift. The burst shows evidence for passage through additional turbulent magnetized plasma, potentially associated with the host galaxy. We use the burst energy of 2 × 1042erg to revise the empirical maximum energy of an FRB. 
    more » « less