Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Gibbons, PhillipB ; Pekhimenko, Gennady ; De_Sa, Christopher (Ed.)The emergence of ML in various cloud system management tasks (e.g., workload autoscaling and job scheduling) has become a core driver of ML-centric cloud platforms. However, there are still numerous algorithmic and systems challenges that prevent ML-centric cloud platforms from being production-ready. In this paper, we focus on the challenges of model performance variability and costly model retraining, introduced by dynamic workload patterns and heterogeneous applications and infrastructures in cloud environments. To address these challenges, we present FLASH, an extensible framework for fast model adaptation in ML-based system management tasks. We show how FLASH leverages existing ML agents and their training data to learn to generalize across applications/environments with meta-learning. FLASH can be easily integrated with an existing ML-based system management agent with a unified API. We demonstrate the use of FLASH by implementing three existing ML agents that manage (1) resource configurations, (2) autoscaling, and (3) server power. Our experiments show that FLASH enables fast adaptation to new, previously unseen applications/environments (e.g., 5.5× faster than transfer learning in the autoscaling task), indicating significant potential for adopting ML-centric cloud platforms in production.more » « lessFree, publicly-accessible full text available September 1, 2025
-
Begnum, Kyrre ; Border, Charles (Ed.)With the increasing popularity of large deep learning model serving workloads, there is a pressing need to reduce the energy consumption of a model-serving cluster while maintaining satisfied throughput or model-serving latency requirements. Model multiplexing approaches such as model parallelism, model placement, replication, and batching aim to optimize the model-serving performance. However, they fall short of leveraging the GPU frequency scaling opportunity for power saving. In this paper, we demonstrate (1) the benefits of GPU frequency scaling in power saving for model serving; and (2) the necessity for co-design and optimization of fine grained model multiplexing and GPU frequency scaling. We explore the co-design space and present a novel power-aware model-serving system, μ-Serve. μ-Serve is a model-serving framework that optimizes the power consumption and model serving latency/throughput of serving multiple ML models efficiently in a homogeneous GPU cluster. Evaluation results on production workloads show that μ-Serve achieves 1.2–2.6× power saving by dynamic GPU frequency scaling (up to 61% reduction) without SLO attainment violations.more » « lessFree, publicly-accessible full text available September 1, 2025
-
Oh, A ; Naumann, T ; Globerson, A ; Saenko, K ; Hardt, M ; Levine, S (Ed.)Multi-agent reinforcement learning (MARL) has primarily focused on solving a single task in isolation, while in practice the environment is often evolving, leaving many related tasks to be solved. In this paper, we investigate the benefits of meta-learning in solving multiple MARL tasks collectively. We establish the first line of theoretical results for meta-learning in a wide range of fundamental MARL settings, including learning Nash equilibria in two-player zero-sum Markov games and Markov potential games, as well as learning coarse correlated equilibria in general-sum Markov games. Under natural notions of task similarity, we show that meta-learning achieves provable sharper convergence to various game-theoretical solution concepts than learning each task separately. As an important intermediate step, we develop multiple MARL algorithms with initialization-dependent convergence guarantees. Such algorithms integrate optimistic policy mirror descents with stage-based value updates, and their refined convergence guarantees (nearly) recover the best known results even when a good initialization is unknown. To our best knowledge, such results are also new and might be of independent interest. We further provide numerical simulations to corroborate our theoretical findings.more » « lessFree, publicly-accessible full text available April 1, 2025
-
Free, publicly-accessible full text available November 1, 2024
-
nd (Ed.)This paper addresses the urgent need to transition to global net-zero carbon emissions by 2050 while retaining the ability to meet joint performance and resilience objectives. The focus is on the computing infrastructures, such as hyperscale cloud datacenters, that consume significant power, thus producing increasing amounts of carbon emissions. Our goal is to (1) optimize the usage of green energy sources (e.g., solar energy), which is desirable but expensive and relatively unstable, and (2) continuously reduce the use of fossil fuels, which have a lower cost but a significant negative societal impact. Meanwhile, cloud datacenters strive to meet their customers’ requirements, e.g., service-level objectives (SLOs) in application latency or throughput, which are impacted by infrastructure resilience and availability. We propose a scalable formulation that combines sustainability, cloud resilience, and performance as a joint optimization problem with multiple interdependent objectives to address these issues holistically. Given the complexity and dynamicity of the problem, machine learning (ML) approaches, such as reinforcement learning, are essential for achieving continuous optimization. Our study highlights the challenges of green energy instability which necessitates innovative MLcentric solutions across heterogeneous infrastructures to manage the transition towards green computing. Underlying the MLcentric solutions must be methods to combine classic system resilience techniques with innovations in real-time ML resilience (not addressed heretofore). We believe that this approach will not only set a new direction in the resilient, SLO-driven adoption of green energy but also enable us to manage future sustainable systems in ways that were not possible before.more » « lessFree, publicly-accessible full text available January 1, 2025
-
ABSTRACT Fast radio bursts (FRBs) are transient radio signals of extragalactic origins that are subjected to propagation effects such as dispersion and scattering. It follows then that these signals hold information regarding the medium they have traversed and are hence useful as cosmological probes of the Universe. Recently, FRBs were used to make an independent measure of the Hubble constant H0, promising to resolve the Hubble tension given a sufficient number of detected FRBs. Such cosmological studies are dependent on FRB population statistics, cosmological parameters, and detection biases, and thus it is important to accurately characterize each of these. In this work, we empirically characterize the sensitivity of the Fast Real-time Engine for Dedispersing Amplitudes (FREDDA) which is the current detection system for the Australian Square Kilometre Array Pathfinder (ASKAP). We coherently redisperse high-time resolution data of 13 ASKAP-detected FRBs and inject them into FREDDA to determine the recovered signal-to-noise ratios as a function of dispersion measure. We find that for 11 of the 13 FRBs, these results are consistent with injecting idealized pulses. Approximating this sensitivity function with theoretical predictions results in a systematic error of 0.3 km s−1 Mpc−1 on H0 when it is the only free parameter. Allowing additional parameters to vary could increase this systematic by up to $\sim 1\,$ km s−1 Mpc−1. We estimate that this systematic will not be relevant until ∼400 localized FRBs have been detected, but will likely be significant in resolving the Hubble tension.
-
Fast radio bursts (FRBs) are millisecond-duration pulses of radio emission originating from extragalactic distances. Radio dispersion is imparted on each burst by intervening plasma, mostly located in the intergalactic medium. In this work, we observe the burst FRB 20220610A and localize it to a morphologically complex host galaxy system at redshift 1.016 ± 0.002. The burst redshift and dispersion measure are consistent with passage through a substantial column of plasma in the intergalactic medium and extend the relationship between those quantities measured at lower redshift. The burst shows evidence for passage through additional turbulent magnetized plasma, potentially associated with the host galaxy. We use the burst energy of 2 × 1042erg to revise the empirical maximum energy of an FRB.
Free, publicly-accessible full text available October 20, 2024 -
ABSTRACT We present the discovery of FRB 20210410D with the MeerKAT radio interferometer in South Africa, as part of the MeerTRAP commensal project. FRB 20210410D has a dispersion measure DM = 578.78 ± 2 ${\rm pc \, cm^{-3}}$ and was localized to subarcsec precision in the 2 s images made from the correlation data products. The localization enabled the association of the FRB with an optical galaxy at z = 0.1415, which when combined with the DM places it above the 3σ scatter of the Macquart relation. We attribute the excess DM to the host galaxy after accounting for contributions from the Milky Way’s interstellar medium and halo, and the combined effects of the intergalactic medium and intervening galaxies. This is the first FRB that is not associated with a dwarf galaxy to exhibit a likely large host galaxy DM contribution. We do not detect any continuum radio emission at the FRB position or from the host galaxy down to a 3σ rms of 14.4 $\mu$Jy beam−1. The FRB has a scattering delay of $29.4^{+2.8}_{-2.7}$ ms at 1 GHz, and exhibits candidate subpulses in the spectrum, which hint at the possibility of it being a repeating FRB. Although not constraining, we note that this FRB has not been seen to repeat in 7.28 h at 1.3 GHz with MeerKAT, 3 h at 2.4 GHz with Murriyang, and 5.7 h at simultaneous 2.3 GHz and 8.4 GHz observations with the Deep Space Network. We encourage further follow-up to establish a possible repeating nature.
-
Context. Fast radio bursts (FRBs) are extremely energetic pulses of millisecond duration and unknown origin. To understand the phenomenon that emits these pulses, targeted and un-targeted searches have been performed for multiwavelength counterparts, including the optical. Aims. The objective of this work is to search for optical transients at the positions of eight well-localized (< 1″) FRBs after the arrival of the burst on different timescales (typically at one day, several months, and one year after FRB detection). We then compare this with known optical light curves to constrain progenitor models. Methods. We used the Las Cumbres Observatory Global Telescope (LCOGT) network to promptly take images with its network of 23 telescopes working around the world. We used a template subtraction technique to analyze all the images collected at differing epochs. We have divided the difference images into two groups: In one group we use the image of the last epoch as a template, and in the other group we use the image of the first epoch as a template. We then searched for optical transients at the localizations of the FRBs in the template subtracted images. Results. We have found no optical transients and have therefore set limiting magnitudes to the optical counterparts. Typical limits in apparent and absolute magnitudes for our LCOGT data are ∼22 and −19 mag in the r band, respectively. We have compared our limiting magnitudes with light curves of super-luminous supernovae (SLSNe), Type Ia supernovae (SNe Ia), supernovae associated with gamma-ray bursts (GRB-SNe), a kilonova, and tidal disruption events (TDEs). Conclusions. Assuming that the FRB emission coincides with the time of explosion of these transients, we rule out associations with SLSNe (at the ∼99.9% confidence level) and the brightest subtypes of SNe Ia, GRB-SNe, and TDEs (at a similar confidence level). However, we cannot exclude scenarios where FRBs are directly associated with the faintest of these subtypes or with kilonovae.more » « less
-
Free, publicly-accessible full text available August 29, 2025