skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Libra and the Art of Task Sizing in Big-Data Analytic Systems
Despite extensive investigation of job scheduling in data-intensive computation frameworks, less consideration has been given to optimizing job partitioning for resource utilization and efficient processing. Instead, partitioning and job sizing are a form of dark art, typically left to developer intuition and trial-and-error style experimentation. In this work, we propose that just as job scheduling and resource allocation are out-sourced to a trusted mechanism external to the workload, so too should be the responsibility for partitioning data as a determinant for task size. Job partitioning essentially involves determining the partition sizes to match the resource allocation at the finest granularity. This is a complex, multi-dimensional problem that is highly application specific: resource allocation, computational runtime, shuffle and reduce communication requirements, and task startup overheads all have strong influence on the most effective task size for efficient processing. Depending on the partition size, the job completion time can differ by as much as 10 times! Fortunately, we observe a general trend underlying the tradeoff between full resource utilization and system overhead across different settings. The optimal job partition size balances these two conflicting forces. Given this trend, we design Libra to automate job partitioning as a framework extension. We integrate Libra with Spark and evaluate its performance on EC2. Compared to state-of-the-art techniques, Libra can reduce the individual job execution time by 25% to 70%.  more » « less
Award ID(s):
1815115
PAR ID:
10122203
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Symposium on Cloud Computing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work presents a framework for estimating job wait times in High-Performance Computing (HPC) scheduling queues, leverag- ing historical job scheduling data and real-time system metrics. Using machine learning techniques, specifically Random Forest and Multi-Layer Perceptron (MLP) models, we demonstrate high accuracy in predicting wait times, achieving 94.2% reliability within a 10-minute error margin. The framework incorporates key fea- tures such as requested resources, queue occupancy, and system utilization, with ablation studies revealing the significance of these features. Additionally, the framework offers users wait time esti- mates for different resource configurations, enabling them to select optimal resources, reduce delays, and accelerate computational workloads. Our approach provides valuable insights for both users and administrators to optimize job scheduling, contributing to more efficient resource management and faster time to scientific results. 
    more » « less
  2. Apache Mesos, a cluster-wide resource manager, is widely deployed in massive scale at several Clouds and Data Centers. Mesos aims to provide high cluster utilization via fine grained resource co-scheduling and resource fairness among multiple users through Dominant Resource Fairness (DRF) based allocation. DRF takes into account different resource types (CPU, Memory, Disk I/O) requested by each application and determines the share of each cluster resource that could be allocated to the applications. Mesos has adopted a two-level scheduling policy: (1) DRF to allocate resources to competing frameworks and (2) task level scheduling by each framework for the resources allocated during the previous step. We have conducted experiments in a local Mesos cluster when used with frameworks such as Apache Aurora, Marathon, and our own framework Scylla, to study resource fairness and cluster utilization. Experimental results show how informed decision regarding second level scheduling policy of frameworks and attributes like offer holding period, offer refusal cycle and task arrival rate can reduce unfair resource distribution. Bin-Packing scheduling policy on Scylla with Marathon can reduce unfair allocation from 38% to 3%. By reducing unused free resources in offers we bring down the unfairness from to 90% to 28%. We also show the effect of task arrival rate to reduce the unfairness from 23% to 7%. 
    more » « less
  3. With the technology trend of hardware and workload consolidation for embedded systems and the rapid development of edge computing, there has been increasing interest in supporting parallel real-time tasks to better utilize the multi-core platforms while meeting the stringent real-time constraints. For parallel real-time tasks, the federated scheduling paradigm, which assigns each parallel task a set of dedicated cores, achieves good theoretical bounds by ensuring exclusive use of processing resources to reduce interferences. However, because cores share the last-level cache and memory bandwidth resources, in practice tasks may still interfere with each other despite executing on dedicated cores. Such resource interferences due to concurrent accesses can be even more severe for embedded platforms or edge servers, where the computing power and cache/memory space are limited. To tackle this issue, in this work, we present a holistic resource allocation framework for parallel real-time tasks under federated scheduling. Under our proposed framework, in addition to dedicated cores, each parallel task is also assigned with dedicated cache and memory bandwidth resources. Further, we propose a holistic resource allocation algorithm that well balances the allocation between different resources to achieve good schedulability. Additionally, we provide a full implementation of our framework by extending the federated scheduling system with Intel’s Cache Allocation Technology and MemGuard. Finally, we demonstrate the practicality of our proposed framework via extensive numerical evaluations and empirical experiments using real benchmark programs. 
    more » « less
  4. Real-time data stream processing at the edge is crucial for time-sensitive tasks within large-scale IoT systems. Task scheduling plays a key role in managing the Quality of Service (QoS), necessitating a prioritization system to distinguish between high and low-priority tasks, thus ensuring efficient data processing on edge nodes. Existing scheduling algorithms rigidly prioritize tasks deemed as high-priority, often at the expense of fairness and overall system efficiency. In this paper, we propose a Priority-aware Fair Task Scheduling (FTS-Hybrid) algorithm that addresses these challenges by managing priority based task execution in a controlled manner. Our task scheduling algorithm streamlines resource utilization and enhances system responsiveness, contributing to low latency and high throughput, outperforming competing techniques including First-Come-FirstServe (FCFS), Round Robin (RR), and Priority Scheduling (PS). We implemented FTS-Hybrid on Apache Storm and evaluated its performance using an open-source real-time IoT benchmark (RIoTBench). Experimental results show that the FTS-Hybrid algorithm reduces task execution latency by 24%, 31%, and 26% compared with FCFS, RR, and PS, respectively, by strategically mitigating queuing delays under dynamic workload conditions. 
    more » « less
  5. Traditional systems for allocating finite cluster resources among competing jobs have either aimed at providing fairness, relied on users to specify their resource requirements, or have estimated these requirements via surrogate metrics (e.g. CPU utilization). These approaches do not account for a job’s real world performance (e.g. P95 latency). Existing performance-aware systems use offline profiled data and/or are designed for specific allocation objectives. In this work, we argue that resource allocation systems should directly account for real-world performance and the varied allocation objectives of users. In this pursuit, we build Cilantro. At the core of Cilantro is an online learning mechanism which forms feedback loops with the jobs to estimate the resource to performance mappings and load shifts. This relieves users from the onerous task of job profiling and collects reliable real-time feedback. This is then used to achieve a variety of user-specified scheduling objectives. Cilantro handles the uncertainty in the learned models by adapting the underlying policy to work with confidence bounds. We demonstrate this in two settings. First, in a multi-tenant 1000 CPU cluster with 20 independent jobs, three of Cilantro’s policies outperform 9 other baselines on three different performance-aware scheduling objectives, improving user utilities by up to 1.2 − 3.7x. Second, in a microservices setting, where 160 CPUs must be distributed between 19 inter-dependent microservices, Cilantro outperforms 3 other baselines, reducing the end-to-end P99 latency to x0.57 the next best baseline. 
    more » « less