Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Modern distributed machine learning (ML) training workloads benefit significantly from leveraging GPUs. However, significant contention ensues when multiple such workloads are run atop a shared cluster of GPUs. A key question is how to fairly apportion GPUs across workloads. We find that established cluster scheduling disciplines are a poor fit because of ML workloads' unique attributes: ML jobs have long-running tasks that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement. We propose Themis, a new scheduling framework for ML training workloads. It's GPU allocation policy enforces that ML workloads complete in a finish-time fair manner, a new notion we introduce. To capture placement sensitivity and ensure efficiency, Themis uses a two-level scheduling architecture where ML workloads bid on available resources that are offered in an auction run by a central arbiter. Our auction design allocates GPUs to winning bids by trading off fairness for efficiency in the short term, but ensuring finish-time fairness in the long term. Our evaluation on a production trace shows that Themis can improve fairness by more than 2.25X and is ~5% to 250% more cluster efficient in comparison to state-of-the-art schedulers.more » « less
-
The increased use of micro-services to build web applications has spurred the rapid growth of Function-as-a-Service (FaaS) or serverless computing platforms. While FaaS simplifies provisioning and scaling for application developers, it introduces new challenges in resource management that need to be handled by the cloud provider. Our analysis of popular serverless workloads indicates that schedulers need to handle functions that are very short-lived, have unpredictable arrival patterns, and require expensive setup of sandboxes. The challenge of running a large number of such functions in a multi-tenant cluster makes existing scheduling frameworks unsuitable. We present Archipelago, a platform that enables low latency request execution in a multi-tenant serverless setting. Archipelago views each application as a DAG of functions, and every DAG in associated with a latency deadline. Archipelago achieves its per-DAG request latency goals by: (1) partitioning a given cluster into a number of smaller worker pools, and associating each pool with a semi-global scheduler (SGS), (2) using a latency-aware scheduler within each SGS along with proactive sandbox allocation to reduce overheads, and (3) using a load balancing layer to route requests for different DAGs to the appropriate SGS, and automatically scale the number of SGSs per DAG. Our testbed results show that Archipelago meets the latency deadline for more than 99% of realistic application request workloads, and reduces tail latencies by up to 36X compared to state-of-the-art serverless platforms.more » « less
-
Abstract We present the full panchromatic afterglow light-curve data of GW170817, including new radio data as well as archival optical and X-ray data, between 0.5 and 940 days post-merger. By compiling all archival data and reprocessing a subset of it, we have evaluated the impact of differences in data processing or flux determination methods used by different groups and attempted to mitigate these differences to provide a more uniform data set. Simple power-law fits to the uniform afterglow light curve indicate a t 0.86±0.04 rise, a t −1.92±0.12 decline, and a peak occurring at 155 ± 4 days. The afterglow is optically thin throughout its evolution, consistent with a single spectral index (−0.584 ± 0.002) across all epochs. This gives a precise and updated estimate of the electron power-law index, p = 2.168 ± 0.004. By studying the diffuse X-ray emission from the host galaxy, we place a conservative upper limit on the hot ionized interstellar medium density, <0.01 cm −3 , consistent with previous afterglow studies. Using the late-time afterglow data we rule out any long-lived neutron star remnant having a magnetic field strength between 10 10.4 and 10 16 G. Our fits to the afterglow data using an analytical model that includes Very Long Baseline Interferometry proper motion from Mooley et al., and a structured jet model that ignores the proper motion, indicates that the proper-motion measurement needs to be considered when seeking an accurate estimate of the viewing angle.more » « less