Consider a queuing system with K parallel queues in which the server for each queue processes jobs at rate n and the total arrival rate to the system is [Formula: see text], where [Formula: see text] and n is large. Interarrival and service times are taken to be independent and exponentially distributed. It is well known that the join-the-shortest-queue (JSQ) policy has many desirable load-balancing properties. In particular, in comparison with uniformly at random routing, the time asymptotic total queue-length of a JSQ system, in the heavy traffic limit, is reduced by a factor of K. However, this decrease in total queue-length comes at the price of a high communication cost of order [Formula: see text] because at each arrival instant, the state of the full K-dimensional system needs to be queried. In view of this, it is of interest to study alternative routing policies that have lower communication costs and yet have similar load-balancing properties as JSQ. In this work, we study a family of such rank-based routing policies, which we will call Marginal Size Bias Load-Balancing policies, in which [Formula: see text] of the incoming jobs are routed to servers with probabilities depending on their ranked queue length and the remaining jobs are routed uniformly at random. A particular case of such routing schemes, referred to as the marginal JSQ (MJSQ) policy, is one in which all the [Formula: see text] jobs are routed using the JSQ policy. Our first result provides a heavy traffic approximation theorem for such queuing systems in terms of reflected diffusions in the positive orthant [Formula: see text]. It turns out that, unlike the JSQ system, where, due to a state space collapse, the heavy traffic limit is characterized by a one-dimensional reflected Brownian motion, in the setting of MJSQ (and for the more general rank-based routing schemes), there is no state space collapse, and one obtains a novel diffusion limit which is the constrained analogue of the well-studied Atlas model (and other rank-based diffusions) that arise from certain problems in mathematical finance. Next, we prove an interchange of limits ([Formula: see text] and [Formula: see text]) result which shows that, under conditions, the steady state of the queuing system is well approximated by that of the limiting diffusion. It turns out that the latter steady state can be given explicitly in terms of product laws of Exponential random variables. Using these explicit formulae, and the interchange of limits result, we compute the time asymptotic total queue-length in the heavy traffic limit for the MJSQ system. We find the striking result that, although in going from JSQ to MJSQ, the communication cost is reduced by a factor of [Formula: see text], the steady-state heavy traffic total queue-length increases by at most a constant factor (independent of n, K) which can be made arbitrarily close to one by increasing a MJSQ parameter. We also study the case where the system is overloaded—namely, [Formula: see text]. For this case, we show that although the K-dimensional MJSQ system is unstable, unlike the setting of random routing, the system has certain desirable and quantifiable load-balancing properties. In particular, by establishing a suitable interchange of limits result, we show that the steady-state difference between the maximum and the minimum queue lengths stays bounded in probability (in the heavy traffic parameter n). Funding: Financial support from the National Science Foundation [RTG Award DMS-2134107] is gratefully acknowledged. S. Banerjee received financial support from the National Science Foundation [NSF-CAREER Award DMS-2141621]. A. Budhiraja received financial support from the National Science Foundation [Grant DMS-2152577].
more »
« less
This content will become publicly available on July 8, 2026
Many-Server Asymptotics for Join-the-Shortest-Queue in the Super-Halfin-Whitt Scaling Window
Join-the-shortest queue (JSQ) is a classical benchmark for the performance of parallel-server queueing systems because of its strong optimality properties. Recently, there has been significant progress in understanding its large-system asymptotic behavior. In this paper, we analyze the JSQ policy in the super-Halfin-Whitt scaling window when load per server [Formula: see text] scales with the system size N as [Formula: see text] for [Formula: see text] and [Formula: see text]. We establish that the centered and scaled total queue length process converges to a certain Bessel process with negative drift, and the associated (centered and scaled) steady-state total queue length, indexed by N, converges to a [Formula: see text] distribution. The limit laws are universal in the sense that they do not depend on the value of [Formula: see text] and exhibit fundamentally different behavior from both the Halfin–Whitt regime ([Formula: see text]) and the nondegenerate slowdown (NDS) regime ([Formula: see text]). Funding: This work was supported by the National Science Foundation to S. Banerjee [Grants CAREER DMS-2141621 and RTG DMS-2134107] and D. Mukherjee and Z. Zhao [Grants CIF-2113027 and CPS-2240982].
more »
« less
- Award ID(s):
- 2113027
- PAR ID:
- 10634975
- Publisher / Repository:
- INFORMS
- Date Published:
- Journal Name:
- Mathematics of Operations Research
- ISSN:
- 0364-765X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The fundamental problem in the study of parallel-server systems is that of finding and analyzing routing policies of arriving jobs to the servers that efficiently balance the load on the servers. The most well-studied policies are (in decreasing order of efficiency) join the shortest workload (JSW), which assigns arrivals to the server with the least workload; join the shortest queue (JSQ), which assigns arrivals to the smallest queue; the power-of-[Formula: see text] (PW([Formula: see text])), which assigns arrivals to the shortest among [Formula: see text] queues that are sampled from the total of [Formula: see text] queues uniformly at random; and uniform routing, under which arrivals are routed to one of the [Formula: see text] queues uniformly at random. In this paper we study the stability problem of parallel-server systems, assuming that routing errors may occur, so that arrivals may be routed to the wrong queue (not the smallest among the relevant queues) with a positive probability. We treat this routing mechanism as a probabilistic routing policy, named a [Formula: see text]-allocation policy, that generalizes the PW([Formula: see text]) policy, and thus also the JSQ and uniform routing, where [Formula: see text] is an [Formula: see text]-dimensional vector whose components are the routing probabilities. Our goal is to study the (in)stability problem of the system under this routing mechanism, and under its “nonidling” version, which assigns new arrivals to an idle server, if such a server is available, and otherwise routes according to the [Formula: see text]-allocation rule. We characterize a sufficient condition for stability, and prove that the stability region, as a function of the system’s primitives and [Formula: see text], is in general smaller than the set [Formula: see text]. Our analyses build on representing the queue process as a continuous-time Markov chain in an ordered space of [Formula: see text]-dimensional real-valued vectors, and using a generalized form of the Schur-convex order.more » « less
-
We consider a load-balancing system composed of a fixed number of single-server queues operating under the well-known join-the-shortest queue policy and where jobs/customers are impatient and abandon if they do not receive service after some (random) amount of time. In this setting, we characterize the centered and appropriately scaled steady-state queue-length distribution (hereafter referred to as limiting distribution) in the limit as the abandonment rate goes to zero at the same time as the load either converges to one or is larger than one. Depending on the arrival, service, and abandonment rates, we observe three different regimes of operation that yield three different limiting distributions. The first regime is when the system is underloaded, and its load converges relatively slowly to one. In this case, abandonments do not affect the limiting distribution, and we obtain the same exponential distribution as in the system without abandonments. When the load converges to one faster, we have the second regime, where abandonments become significant. Here, the system undergoes a phase transition, and the limiting distribution is a truncated Gaussian. Further, the third regime is when the system is heavily overloaded, and so, the queue lengths are very large. In this case, we show that the limiting distribution converges to a normal distribution. To establish our results, we first prove a weaker form of state space collapse by providing a uniform bound on the second moment of the (unscaled) perpendicular component of the queue lengths, which shows that the system behaves like a single-server queue. We then use exponential Lyapunov functions to characterize the limiting distribution of the steady-state queue-length vector. Funding: This work was supported by the National Science Foundation [Grants CMMI-2140534 and EPCN-2144316].more » « less
-
We characterize heavy-traffic process and steady-state limits for systems staffed according to the square-root safety rule, when the service requirements of the customers are perfectly correlated with their individual patience for waiting in queue. Under the usual many-server diffusion scaling, we show that the system is asymptotically equivalent to a system with no abandonment. In particular, the limit is the Halfin-Whitt diffusion for the [Formula: see text] queue when the traffic intensity approaches its critical value 1 from below, and is otherwise a transient diffusion, despite the fact that the prelimit is positive recurrent. To obtain a refined measure of the congestion due to the correlation, we characterize a lower-order fluid (LOF) limit for the case in which the diffusion limit is transient, demonstrating that the queue in this case scales like [Formula: see text]. Under both the diffusion and LOF scalings, we show that the stationary distributions converge weakly to the time-limiting behavior of the corresponding process limit. Funding: This work was supported by the National Natural Science Foundation of China [Grant 72188101] and the Division of Civil, Mechanical and Manufacturing Innovation [Grants 1763100 and 2006350].more » « less
-
Let ρ be a general law-invariant convex risk measure, for instance, the average value at risk, and let X be a financial loss, that is, a real random variable. In practice, either the true distribution μ of X is unknown, or the numerical computation of [Formula: see text] is not possible. In both cases, either relying on historical data or using a Monte Carlo approach, one can resort to an independent and identically distributed sample of μ to approximate [Formula: see text] by the finite sample estimator [Formula: see text] (μNdenotes the empirical measure of μ). In this article, we investigate convergence rates of [Formula: see text] to [Formula: see text]. We provide nonasymptotic convergence rates for both the deviation probability and the expectation of the estimation error. The sharpness of these convergence rates is analyzed. Our framework further allows for hedging, and the convergence rates we obtain depend on neither the dimension of the underlying assets nor the number of options available for trading. Funding: Daniel Bartl is grateful for financial support through the Vienna Science and Technology Fund [Grant MA16-021] and the Austrian Science Fund [Grants ESP-31 and P34743]. Ludovic Tangpi is supported by the National Science Foundation [Grant DMS-2005832] and CAREER award [Grant DMS-2143861].more » « less
An official website of the United States government
