 Award ID(s):
 2006350
 NSFPAR ID:
 10464194
 Date Published:
 Journal Name:
 Operations Research
 Volume:
 70
 Issue:
 4
 ISSN:
 0030364X
 Page Range / eLocation ID:
 2456 to 2476
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Consider a system with N identical singleserver queues and a number of task types, where each server is able to process only a small subset of possible task types. Arriving tasks select [Formula: see text] random compatible servers and join the shortest queue among them. The compatibility constraints are captured by a fixed bipartite graph between the servers and the task types. When the graph is complete bipartite, the meanfield approximation is accurate. However, such dense compatibility graphs are infeasible for largescale implementation. We characterize a class of sparse compatibility graphs for which the meanfield approximation remains valid. For this, we introduce a novel notion, called proportional sparsity, and establish that systems with proportionally sparse compatibility graphs asymptotically match the performance of a fully flexible system. Furthermore, we show that proportionally sparse random compatibility graphs can be constructed, which reduce the server degree almost by a factor [Formula: see text] compared with the complete bipartite compatibility graph.more » « less

null (Ed.)In multiserver queueing systems where there is no central queue holding all incoming jobs, job dispatching policies are used to assign incoming jobs to the queue at one of the servers. Classic job dispatching policies such as jointheshortestqueue and shortest expected delay assume that the service rates and queue lengths of the servers are known to the dispatcher. In this work, we tackle the problem of job dispatching without the knowledge of service rates and queue lengths, where the dispatcher can only obtain noisy estimates of the service rates by observing job departures. This problem presents a novel explorationexploitation tradeoff between sending jobs to all the servers to estimate their service rates, and exploiting the currently known fastest servers to minimize the expected queueing delay. We propose a banditbased exploration policy that learns the service rates from observed job departures. Unlike the standard multiarmed bandit problem where only one out of a finite set of actions is optimal, here the optimal policy requires identifying the optimal fraction of incoming jobs to be sent to each server. We present a regret analysis and simulations to demonstrate the effectiveness of the proposed banditbased exploration policy.more » « less

We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcher
u_{n} at servers_{m} , a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizonT . The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities.We first show that the expected utility of the optimal dynamic policy is upper bounded by
T times the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of prioritybased and JointtheShortestQueue routing at the dispatchers and prioritybased scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T )regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy. 
We consider load balancing in largescale heterogeneous server systems in the presence of data locality that imposes constraints on which tasks can be assigned to which servers. The constraints are naturally captured by a bipartite graph between the servers and the dispatchers handling assignments of various arrival flows. When a task arrives, the corresponding dispatcher assigns it to a server with the shortest queue among [Formula: see text] randomly selected servers obeying these constraints. Server processing speeds are heterogeneous, and they depend on the server type. For a broad class of bipartite graphs, we characterize the limit of the appropriately scaled occupancy process, both on the process level and in steady state, as the system size becomes large. Using such a characterization, we show that imposing data locality constraints can significantly improve the performance of heterogeneous systems. This is in stark contrast to either heterogeneous servers in a full flexible system or data locality constraints in systems with homogeneous servers, both of which have been observed to degrade the system performance. Extensive numerical experiments corroborate the theoretical results.
Funding: This work was partially supported by the National Science Foundation [CCF. 07/2021–06/2024].

We consider largescale load balancing systems where processing time distribution of tasks depend on both task and server types. We analyze the system in the asymptotic regime where the number of task and server types tend to infinity proportionally to each other. In such heterogeneous setting, popular policies like Join Fastest Idle Queue (JFIQ), Join Fastest Shortest Queue (JFSQ) are known to perform poorly and they even shrink the stability region. Moreover, to the best of our knowledge, in this setup, finding a scalable policy with provable performance guarantee has been an open question prior to this work. In this paper, we propose and analyze two asymptotically delayoptimal dynamic load balancing approaches: (a) one that efficiently reserves the processing capacity of each server for good tasks and route tasks under the Join Idle Queue policy; and (b) a speedpriority policy that increases the probability of servers processing tasks at a high speed. Introducing a novel analytical framework and using the meanfield method and stochastic coupling arguments, we prove that both policies above achieve asymptotic zero queueing, whereby the probability that a typical task is assigned to an idle server tends to 1 as the system scales.