We consider the problem of scheduling to minimize asymptotic tail latency in an M/G/1 queue with unknown job sizes. When the job size distribution is heavy-tailed, numerous policies that do not require job size information (e.g. Processor Sharing, Least Attained Service) are known to be strongly tail optimal, meaning that their response time tail has the fastest possible asymptotic decay. In contrast, for light-tailed size distributions, only in the last few years have policies been developed that outperform simple First-Come First-Served (FCFS). The most recent of these is γ-Boost, which achieves strong tail optimality in the light-tailed setting. But thus far, all policies that outperform FCFS in the light-tailed setting, including γ-Boost, require known job sizes. In this paper, we design a new scheduling policy that achieves strong tail optimality in the light-tailed M/G/1 with unknown job sizes. Surprisingly, the optimal policy turns out to be a variant of the Gittins policy, but with a novel and unusual feature: it uses a negative discount rate. Our work also applies to systems with partial information about job sizes, covering γ-Boost as an extreme case when job sizes are in fact fully known.
more »
« less
When Does the Gittins Policy Have Asymptotically Optimal Response Time Tail in the M/G/1?
We consider scheduling in the M/G/1 queue with unknown job sizes. It is known that the Gittins policy minimizes mean response time in this setting. However, the behavior of the tail of response time under Gittins is poorly understood, even in the large-response-time limit. Characterizing Gittins’s asymptotic tail behavior is important because if Gittins has optimal tail asymptotics, then it simultaneously provides optimal mean response time and good tail performance. In this work, we give the first comprehensive account of Gittins’s asymptotic tail behavior. For heavy-tailed job sizes, we find that Gittins always has asymptotically optimal tail. The story for light-tailed job sizes is less clear-cut: Gittins’s tail can be optimal, pessimal, or in between. To remedy this, we show that a modification of Gittins avoids pessimal tail behavior, while achieving near-optimal mean response time.
more »
« less
- Award ID(s):
- 2307008
- PAR ID:
- 10532338
- Publisher / Repository:
- INFORMS
- Date Published:
- Journal Name:
- Operations Research
- ISSN:
- 0030-364X
- Subject(s) / Keyword(s):
- queueing theory scheduling M/G/1 queue Gittins index response time tail heavy-tailed distributions light-tailed distributions
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the problem of scheduling jobs in a queueing system, specifically an M/G/1 with light-tailed job sizes, to asymptotically optimize the response time tail. This means scheduling to make P[T > t], the chance a job's response time exceeds t, decay as quickly as possible in the t \to \infty limit. For some time, the best known policy was First-Come First-Served (FCFS), which has an asymptotically exponential tail: P[T > t] ~ C e^-γ t . FCFS achieves the optimal decay rate γ, but its tail constant C is suboptimal. Only recently have policies that improve upon FCFS's tail constant been discovered. But it is unknown what the optimal tail constant is, let alone what policy might achieve it. In this paper, we derive a closed-form expression for the optimal tail constant C, and we introduce γ-Boost, a new policy that achieves this optimal tail constant. Roughly speaking, γ-Boost operates similarly to FCFS, but it pretends that small jobs arrive earlier than their true arrival times. This significantly reduces the response time of small jobs without unduly delaying large jobs, improving upon FCFS's tail constant by up to 50% with only moderate job size variability, with even larger improvements for higher variability. While these results are for systems with full job size information, we also introduce and analyze a version of γ-Boost that works in settings with partial job size information, showing it too achieves significant gains over FCFS. Finally, we show via simulation that γ-Boost has excellent practical performance.more » « less
-
Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses on maximizing utilization, with almost nothing known about mean response time. In simpler settings, such as various known-size single-server-job settings, minimizing mean response time is merely a matter of prioritizing small jobs. However, for the multiserver-job system, prioritizing small jobs is not enough, because we must also ensure servers are not unnecessarily left idle. Thus, minimizing mean response time requires prioritizing small jobs while simultaneously maximizing throughput. Our question is how to achieve these joint objectives. We devise the ServerFilling-SRPT scheduling policy, which is the first policy to minimize mean response time in the multiserver-job model in the heavy traffic limit. In addition to proving this heavy-traffic result, we present empirical evidence that ServerFilling-SRPT outperforms all existing scheduling policies for all loads, with improvements by orders of magnitude at higher loads. Because ServerFilling-SRPT requires knowing job sizes, we also define the ServerFilling-Gittins policy, which is optimal when sizes are unknown or partially known.more » « less
-
null (Ed.)The Gittins scheduling policy minimizes the mean response in the single-server M/G/1 queue in a wide variety of settings. Most famously, Gittins is optimal when preemption is allowed and service requirements are unknown but drawn from a known distribution. Gittins is also optimal much more generally, adapting to any amount of available information and any preemption restrictions. However, scheduling to minimize mean response time in a multiserver setting, specifically the central-queue M/G/k, is a much more difficult problem. In this work we give the first general analysis of Gittins in the M/G/k. Specifically, we show that under extremely general conditions, Gittins's mean response time in the M/G/k is at most its mean response time in the M/G/1 plus an $O(łog(1/(1 - ρ)))$ additive term, where ρ is the system load. A consequence of this result is that Gittins is heavy-traffic optimal in the M/G/k if the service requirement distribution S satisfies $$\mathbfE [S^2(łog S)^+] < \infty$$. This is the most general result on minimizing mean response time in the M/G/k to date. To prove our results, we combine properties of the Gittins policy and Palm calculus in a novel way. Notably, our technique overcomes the limitations of tagged job methods that were used in prior scheduling analyses.more » « less
-
null (Ed.)Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., an arbitrary job can only be served at servers which contain certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a "well-connected'' graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.more » « less
An official website of the United States government

