skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Shuffle Scheduling for MapReduce Jobs Based on Periodic Network Status
MapReduce jobs need to shuffle a large amount of data over the network between mapper and reducer nodes. The shuffle time accounts for a big part of the total running time of the MapReduce jobs. Therefore, optimizing the makespan of shuffle phase can greatly improve the performance of MapReduce jobs. A large fraction of production jobs in data centers are recurring with predictable characteristics, and the recurring jobs split the network into periodic busy and idle time slots, which allows us to better schedule the shuffle data in order to reduce the makespan of shuffle phase with the future predictable network status available. In this paper, we formulate the shuffle scheduling problem with the aim to minimize the makespan of MapReduce shuffle phase by leveraging the predictable periodic network status. We then propose a simple yet effective network-aware shuffle scheduling algorithm (NAS) to reduce the number of idle time slots required to transfer the shuffle data so as to reduce the shuffle makespan. We also prove that the proposed algorithm NAS is a 32 -approximation algorithm to the shuffle scheduling problem when all the future idle time slots have the same duration. We finally conduct experiments through simulations. Experimental results demonstrate the proposed algorithm can effectively reduce the makespan of MapReduce shuffle phase and increase network utilization.  more » « less
Award ID(s):
1907472
PAR ID:
10185626
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE/ACM Transactions on Networking
Volume:
28
Issue:
4
ISSN:
1063-6692
Page Range / eLocation ID:
1832-1844
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract We consider a natural generalization of classical scheduling problems to a setting in which using a time unit for processing a job causes some time-dependent cost, the time-of-use tariff, which must be paid in addition to the standard scheduling cost. We focus on preemptive single-machine scheduling and two classical scheduling cost functions, the sum of (weighted) completion times and the maximum completion time, that is, the makespan. While these problems are easy to solve in the classical scheduling setting, they are considerably more complex when time-of-use tariffs must be considered. We contribute optimal polynomial-time algorithms and best possible approximation algorithms. For the problem of minimizing the total (weighted) completion time on a single machine, we present a polynomial-time algorithm that computes for any given sequence of jobs an optimal schedule, i.e., the optimal set of time slots to be used for preemptively scheduling jobs according to the given sequence. This result is based on dynamic programming using a subtle analysis of the structure of optimal solutions and a potential function argument. With this algorithm, we solve the unweighted problem optimally in polynomial time. For the more general problem, in which jobs may have individual weights, we develop a polynomial-time approximation scheme (PTAS) based on a dual scheduling approach introduced for scheduling on a machine of varying speed. As the weighted problem is strongly NP-hard, our PTAS is the best possible approximation we can hope for. For preemptive scheduling to minimize the makespan, we show that there is a comparably simple optimal algorithm with polynomial running time. This is true even in a certain generalized model with unrelated machines. 
    more » « less
  2. null (Ed.)
    We propose a flexible low complexity design (FLCD) of coded distributed computing (CDC) with empirical evaluation on Amazon Elastic Compute Cloud (Amazon EC2). CDC can expedite MapReduce like computation by trading increased map computations to reduce communication load and shuffle time. A main novelty of FLCD is to utilize the design freedom in defining map and reduce functions to develop asymptotic homogeneous systems to support varying intermediate values (IV) sizes under a general MapReduce framework. Compared to existing designs with constant IV sizes, FLCD offers greater flexibility in adapting to network parameters and significantly reduces the implementation complexity by requiring fewer input files and shuffle groups. The FLCD scheme is the first proposed low-complexity CDC design that can operate on a network with an arbitrary number of nodes and computation load. We perform empirical evaluations of the FLCD by executing the TeraSort algorithm on an Amazon EC2 cluster. This is the first time that theoretical predictions of the CDC shuffle time are validated by empirical evaluations. The evaluations demonstrate a 2.0 to 4.24 speedup compared to conventional uncoded MapReduce, a 12% to 52% reduction in total time, and a wider range of operating network parameters compared to existing CDC schemes. 
    more » « less
  3. Abstract Railroads are maintained routinely by using various types of rail‐bound machines so as to achieve the longest possible rail life and reduce the safety risks associated with unanticipated rail failures. The rail maintenance routing and scheduling problem (RMRSP), which involves routing of multiple maintenance vehicles and scheduling of hundreds of maintenance jobs over a large‐scale network, is usually subject to various types of complex constraints and extremely difficult to solve. This article proposes a vehicle routing problem with time windows (VRPTW) formulation for RMRSP and develops a customized stepwise algorithm to solve the problem. A series of numerical experiments are conducted to demonstrate that the proposed algorithm works very effectively, significantly outperforming the state‐of‐the‐art commercial solver. The results of two real‐world instances from a Class I railroad company show that the proposed model and solution algorithm enable the expensive maintenance vehicles to achieve a higher level of utilization, that is, spending more time on working and less time on deadhead traveling. 
    more » « less
  4. Naor, Joseph; Buchbinder, Niv (Ed.)
    In the problem of scheduling with non-uniform communication delays, the input is a set of jobs with precedence constraints. Associated with every precedence constraint between a pair of jobs is a communication delay, the time duration the scheduler has to wait between the two jobs if they are scheduled on different machines. The objective is to assign the jobs to machines to minimize the makespan of the schedule. Despite being a fundamental problem in theory and a consequential problem in practice, the approximability of scheduling problems with communication delays is not very well understood. One of the top ten open problems in scheduling theory, in the influential list by Schuurman and Woeginger and its latest update by Bansal, asks if the problem admits a constant-factor approximation algorithm. In this paper, we answer this question in the negative by proving a logarithmic hardness for the problem under the standard complexity theory assumption that NP-complete problems do not admit quasi-polynomial-time algorithms. Our hardness result is obtained using a surprisingly simple reduction from a problem that we call Unique Machine Precedence constraints Scheduling (UMPS). We believe that this problem is of central importance in understanding the hardness of many scheduling problems and we conjecture that it is very hard to approximate. Among other things, our conjecture implies a logarithmic hardness of related machine scheduling with precedences, a long-standing open problem in scheduling theory and approximation algorithms. 
    more » « less
  5. Due to a growing interest in deep learning applications [5], compute-intensive and long-running (hours to days) training jobs have become a significant component of datacenter workloads. A large fraction of these jobs is often exploratory, with the goal of determining the best model structure (e.g., the number of layers and channels in a convolutional neural network), hyperparameters (e.g., the learning rate), and data augmentation strategies for the target application. Notably, training jobs are often terminated early if their learning metrics (e.g., training and validation accuracy) are not converging, with only a few completing successfully. For this motivating application, we consider the problem of scheduling a set of jobs that can be terminated at predetermined checkpoints with known probabilities estimated from historical data. We prove that, in order to minimize the time to complete the first K successful jobs on a single server, optimal scheduling does not require preemption (even when preemption overhead is negligible) and provide an optimal policy; advantages of this policy are quantified through simulation. Related Work. While job scheduling has been investigated extensively in many scenarios (see [6] and [2] for a survey of recent result), most policies require that the cost of waiting times of each job be known at scheduling time; in contrast, in our setting the scheduler does not know which job will be the K-th successful job, and sojourn times of subsequent jobs do not contribute to the target metric. For example, [4, 3] minimize makespan (i.e., the time to complete all jobs) for known execution times and waiting time costs; similarly, Gittins index [1] and SR rank [7] minimize expected sojourn time of all jobs, i.e., both successfully completed jobs and jobs terminated early. Unfortunately, scheduling policies not distinguishing between these two types of jobs may favor jobs where the next stage is short and leads to early termination with high probability, which is an undesirable outcome in our applications of interest. 
    more » « less