We consider load balancing in large-scale heterogeneous server systems in the presence of data locality that imposes constraints on which tasks can be assigned to which servers. The constraints are naturally captured by a bipartite graph between the servers and the dispatchers handling assignments of various arrival flows. When a task arrives, the corresponding dispatcher assigns it to a server with the shortest queue among [Formula: see text] randomly selected servers obeying these constraints. Server processing speeds are heterogeneous, and they depend on the server type. For a broad class of bipartite graphs, we characterize the limit of the appropriately scaled occupancy process, both on the process level and in steady state, as the system size becomes large. Using such a characterization, we show that imposing data locality constraints can significantly improve the performance of heterogeneous systems. This is in stark contrast to either heterogeneous servers in a full flexible system or data locality constraints in systems with homogeneous servers, both of which have been observed to degrade the system performance. Extensive numerical experiments corroborate the theoretical results. Funding: This work was partially supported by the National Science Foundation [CCF. 07/2021–06/2024].
more »
« less
Load Balancing Under Strict Compatibility Constraints
Consider a system with N identical single-server queues and a number of task types, where each server is able to process only a small subset of possible task types. Arriving tasks select [Formula: see text] random compatible servers and join the shortest queue among them. The compatibility constraints are captured by a fixed bipartite graph between the servers and the task types. When the graph is complete bipartite, the mean-field approximation is accurate. However, such dense compatibility graphs are infeasible for large-scale implementation. We characterize a class of sparse compatibility graphs for which the mean-field approximation remains valid. For this, we introduce a novel notion, called proportional sparsity, and establish that systems with proportionally sparse compatibility graphs asymptotically match the performance of a fully flexible system. Furthermore, we show that proportionally sparse random compatibility graphs can be constructed, which reduce the server degree almost by a factor [Formula: see text] compared with the complete bipartite compatibility graph.
more »
« less
- Award ID(s):
- 2113027
- PAR ID:
- 10323989
- Date Published:
- Journal Name:
- Mathematics of Operations Research
- ISSN:
- 0364-765X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We compute the eigenvalue fluctuations of uniformly distributed random biregular bipartite graphs with fixed and growing degrees for a large class of analytic functions. As a key step in the proof, we obtain a total variation distance bound for the Poisson approximation of the number of cycles and cyclically non-backtracking walks in random biregular bipartite graphs, which might be of independent interest. We also prove a semicircle law for random [Formula: see text]-biregular bipartite graphs when [Formula: see text]. As an application, we translate the results to adjacency matrices of uniformly distributed random regular hypergraphs.more » « less
-
null (Ed.)Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., an arbitrary job can only be served at servers which contain certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a "well-connected'' graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.more » « less
-
The fundamental problem in the study of parallel-server systems is that of finding and analyzing routing policies of arriving jobs to the servers that efficiently balance the load on the servers. The most well-studied policies are (in decreasing order of efficiency) join the shortest workload (JSW), which assigns arrivals to the server with the least workload; join the shortest queue (JSQ), which assigns arrivals to the smallest queue; the power-of-[Formula: see text] (PW([Formula: see text])), which assigns arrivals to the shortest among [Formula: see text] queues that are sampled from the total of [Formula: see text] queues uniformly at random; and uniform routing, under which arrivals are routed to one of the [Formula: see text] queues uniformly at random. In this paper we study the stability problem of parallel-server systems, assuming that routing errors may occur, so that arrivals may be routed to the wrong queue (not the smallest among the relevant queues) with a positive probability. We treat this routing mechanism as a probabilistic routing policy, named a [Formula: see text]-allocation policy, that generalizes the PW([Formula: see text]) policy, and thus also the JSQ and uniform routing, where [Formula: see text] is an [Formula: see text]-dimensional vector whose components are the routing probabilities. Our goal is to study the (in)stability problem of the system under this routing mechanism, and under its “nonidling” version, which assigns new arrivals to an idle server, if such a server is available, and otherwise routes according to the [Formula: see text]-allocation rule. We characterize a sufficient condition for stability, and prove that the stability region, as a function of the system’s primitives and [Formula: see text], is in general smaller than the set [Formula: see text]. Our analyses build on representing the queue process as a continuous-time Markov chain in an ordered space of [Formula: see text]-dimensional real-valued vectors, and using a generalized form of the Schur-convex order.more » « less
-
We consider large-scale load balancing systems where processing time distribution of tasks depend on both task and server types. We analyze the system in the asymptotic regime where the number of task and server types tend to infinity proportionally to each other. In such heterogeneous setting, popular policies like Join Fastest Idle Queue (JFIQ), Join Fastest Shortest Queue (JFSQ) are known to perform poorly and they even shrink the stability region. Moreover, to the best of our knowledge, in this setup, finding a scalable policy with provable performance guarantee has been an open question prior to this work. In this paper, we propose and analyze two asymptotically delay-optimal dynamic load balancing approaches: (a) one that efficiently reserves the processing capacity of each server for good tasks and route tasks under the Join Idle Queue policy; and (b) a speed-priority policy that increases the probability of servers processing tasks at a high speed. Introducing a novel analytical framework and using the mean-field method and stochastic coupling arguments, we prove that both policies above achieve asymptotic zero queueing, whereby the probability that a typical task is assigned to an idle server tends to 1 as the system scales.more » « less
An official website of the United States government

