skip to main content


Title: Optimal Load Balancing with Locality Constraints
Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., an arbitrary job can only be served at servers which contain certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a "well-connected'' graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.  more » « less
Award ID(s):
1739189 1934986
NSF-PAR ID:
10298041
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the ACM on Measurement and Analysis of Computing Systems
Volume:
4
Issue:
3
ISSN:
2476-1249
Page Range / eLocation ID:
1 to 37
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    This paper studies load balancing for many‐server (Nservers) systems. Each server has a buffer of sizeb − 1, and can have at most one job in service andb − 1 jobs in the buffer. The service time of a job follows the Coxian‐2 distribution. We focus on steady‐state performance of load balancing policies in the heavy traffic regime such that the normalized load of system isλ = 1 − Nαfor 0 < α < 0.5. We identify a set of policies that achieve asymptotic zero waiting. The set of policies include several classical policies such as join‐the‐shortest‐queue (JSQ), join‐the‐idle‐queue (JIQ), idle‐one‐first (I1F) and power‐of‐d‐choices (Po d) withd = O(Nα log N). The proof of the main result is based on Stein's method and state space collapse. A key technical contribution of this paper is the iterative state space collapse approach that leads to a simple generator approximation when applying Stein's method.

     
    more » « less
  2. The fundamental problem in the study of parallel-server systems is that of finding and analyzing routing policies of arriving jobs to the servers that efficiently balance the load on the servers. The most well-studied policies are (in decreasing order of efficiency) join the shortest workload (JSW), which assigns arrivals to the server with the least workload; join the shortest queue (JSQ), which assigns arrivals to the smallest queue; the power-of-[Formula: see text] (PW([Formula: see text])), which assigns arrivals to the shortest among [Formula: see text] queues that are sampled from the total of [Formula: see text] queues uniformly at random; and uniform routing, under which arrivals are routed to one of the [Formula: see text] queues uniformly at random. In this paper we study the stability problem of parallel-server systems, assuming that routing errors may occur, so that arrivals may be routed to the wrong queue (not the smallest among the relevant queues) with a positive probability. We treat this routing mechanism as a probabilistic routing policy, named a [Formula: see text]-allocation policy, that generalizes the PW([Formula: see text]) policy, and thus also the JSQ and uniform routing, where [Formula: see text] is an [Formula: see text]-dimensional vector whose components are the routing probabilities. Our goal is to study the (in)stability problem of the system under this routing mechanism, and under its “nonidling” version, which assigns new arrivals to an idle server, if such a server is available, and otherwise routes according to the [Formula: see text]-allocation rule. We characterize a sufficient condition for stability, and prove that the stability region, as a function of the system’s primitives and [Formula: see text], is in general smaller than the set [Formula: see text]. Our analyses build on representing the queue process as a continuous-time Markov chain in an ordered space of [Formula: see text]-dimensional real-valued vectors, and using a generalized form of the Schur-convex order. 
    more » « less
  3. Consider a system with N identical single-server queues and a number of task types, where each server is able to process only a small subset of possible task types. Arriving tasks select [Formula: see text] random compatible servers and join the shortest queue among them. The compatibility constraints are captured by a fixed bipartite graph between the servers and the task types. When the graph is complete bipartite, the mean-field approximation is accurate. However, such dense compatibility graphs are infeasible for large-scale implementation. We characterize a class of sparse compatibility graphs for which the mean-field approximation remains valid. For this, we introduce a novel notion, called proportional sparsity, and establish that systems with proportionally sparse compatibility graphs asymptotically match the performance of a fully flexible system. Furthermore, we show that proportionally sparse random compatibility graphs can be constructed, which reduce the server degree almost by a factor [Formula: see text] compared with the complete bipartite compatibility graph. 
    more » « less
  4. null (Ed.)
    Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems. 
    more » « less
  5. null (Ed.)
    Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not "fit'' into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems. 
    more » « less