skip to main content


Title: On Max-Min Fairness of Completion Times for Multi-Task Job Scheduling
We study the max-min fairness of multi-task jobs in distributed computing platforms. We consider a setting where each job consists of a set of parallel tasks that need to be processed on different servers, and the job is completed once all its tasks finish processing. Each job is associated with a utility which is a decreasing function of its completion time, and captures how sensitive it is to latency. The objective is to schedule tasks in a way that achieves max-min fairness for jobs' utilities, i.e., an optimal schedule in which any attempt to improve the utility of a job necessarily results in hurting the utility of some other job with smaller or equal utility. We first show a strong result regarding NP-hardness of finding the max-min fair vector of job utilities. The implication of this result is that achieving max-min fairness in many other distributed scheduling problems (e.g., coflow scheduling) is NP-hard. We then proceed to define two notions of approximation solutions: one based on finding a certain number of elements of the max-min fair vector, and the other based on a single-objective optimization whose solution gives the max-min fair vector. We develop scheduling algorithms that provide guarantees under these approximation notions, using dynamic programming and random perturbation of tasks' processing times. We verify the performance of our algorithms through extensive simulations, using a real traffic trace from a large Google cluster.  more » « less
Award ID(s):
1652115 1717867
PAR ID:
10249199
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2020 IFIP Networking Conference (Networking)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the max-min fairness of multi-task jobs in distributed computing platforms. We consider a setting where each job consists of a set of parallel tasks that need to be processed on different servers, and the job is completed once all its tasks finish processing. Each job is associated with a utility which is a decreasing function of its completion time, and captures how sensitive it is to latency. The objective is to schedule tasks in a way that achieves max-min fairness for jobs’ utilities, i.e., an optimal schedule in which any attempt to improve the utility of a job necessarily results in hurting the utility of some other job with smaller or equal utility.We first show a strong result regarding NP-hardness of finding the max-min fair vector of job utilities. The implication of this result is that achieving max-min fairness in many other distributed scheduling problems (e.g., coflow scheduling) is NP-hard. We then proceed to define two notions of approximation solutions: one based on finding a certain number of elements of the max-min fair vector, and the other based on a single-objective optimization whose solution gives the max-min fair vector. We develop scheduling algorithms that provide guarantees under these approximation notions, using dynamic programming and random perturbation of tasks’ processing times. We verify the performance of our algorithms through extensive simulations, using a real traffic trace from a large Google cluster. 
    more » « less
  2. We first consider the static problem of allocating resources to (i.e., scheduling) multiple distributed application frameworks, possibly with different priorities and server preferences, in a private cloud with heterogeneous servers. Several fair scheduling mechanisms have been proposed for this purpose. We extend prior results on max-min fair (MMF) and proportional fair (PF) scheduling to this constrained multiresource and multiserver case for generic fair scheduling criteria. The task efficiencies (a metric related to proportional fairness) of max- min fair allocations found by progressive filling are compared by illustrative examples. In the second part of this paper, we consider the online problem (with framework churn) by implementing variants of these schedulers in Apache Mesos using progressive filling to dynamically approximate max-min fair allocations. We evaluate the implemented schedulers in terms of overall execution time of realistic distributed Spark workloads. Our experiments show that resource efficiency is improved and execution times are reduced when the scheduler is “server specific” or when it leverages characterized required resources of the workloads (when known). 
    more » « less
  3. Motivated by cloud computing, we study a market-based approach for job scheduling on multiple machines where users have hard deadlines and prefer earlier completion times. In our model, completing a job provides a benefit equal to its present value, i.e., the value discounted to the time when the job finishes. Users submit job requirements to the cloud provider who non-preemptively schedules jobs to maximize the social welfare, i.e., the sum of present values of completed jobs. Using a simple and fast greedy algorithm, we obtain a 1+s/(s−1) approximation to the optimal schedule, where s>1 is the minimum ratio of a job’s deadline to processing time. Building on our approximation algorithm, we construct a pricing rule to incentivize users to truthfully report all job requirements. 
    more » « less
  4. Motivated by modern parallel computing applications, we consider the problem of scheduling parallel-task jobs with heterogeneous resource requirements in a cluster of machines. Each job consists of a set of tasks that can be processed in parallel; however, the job is considered completed only when all its tasks finish their processing, which we refer to as the synchronization constraint. Furthermore, assignment of tasks to machines is subject to placement constraints, that is, each task can be processed only on a subset of machines, and processing times can also be machine dependent. Once a task is scheduled on a machine, it requires a certain amount of resource from that machine for the duration of its processing. A machine can process (pack) multiple tasks at the same time; however, the cumulative resource requirement of the tasks should not exceed the machine’s capacity. Our objective is to minimize the weighted average of the jobs’ completion times. The problem, subject to synchronization, packing, and placement constraints, is NP-hard, and prior theoretical results only concern much simpler models. For the case that migration of tasks among the placement-feasible machines is allowed, we propose a preemptive algorithm with an approximation ratio of [Formula: see text]. In the special case that only one machine can process each task, we design an algorithm with an improved approximation ratio of four. Finally, in the case that migrations (and preemptions) are not allowed, we design an algorithm with an approximation ratio of 24. Our algorithms use a combination of linear program relaxation and greedy packing techniques. We present extensive simulation results, using a real traffic trace, that demonstrate that our algorithms yield significant gains over the prior approaches. 
    more » « less
  5. null (Ed.)
    Abstract We consider a natural generalization of classical scheduling problems to a setting in which using a time unit for processing a job causes some time-dependent cost, the time-of-use tariff, which must be paid in addition to the standard scheduling cost. We focus on preemptive single-machine scheduling and two classical scheduling cost functions, the sum of (weighted) completion times and the maximum completion time, that is, the makespan. While these problems are easy to solve in the classical scheduling setting, they are considerably more complex when time-of-use tariffs must be considered. We contribute optimal polynomial-time algorithms and best possible approximation algorithms. For the problem of minimizing the total (weighted) completion time on a single machine, we present a polynomial-time algorithm that computes for any given sequence of jobs an optimal schedule, i.e., the optimal set of time slots to be used for preemptively scheduling jobs according to the given sequence. This result is based on dynamic programming using a subtle analysis of the structure of optimal solutions and a potential function argument. With this algorithm, we solve the unweighted problem optimally in polynomial time. For the more general problem, in which jobs may have individual weights, we develop a polynomial-time approximation scheme (PTAS) based on a dual scheduling approach introduced for scheduling on a machine of varying speed. As the weighted problem is strongly NP-hard, our PTAS is the best possible approximation we can hope for. For preemptive scheduling to minimize the makespan, we show that there is a comparably simple optimal algorithm with polynomial running time. This is true even in a certain generalized model with unrelated machines. 
    more » « less