skip to main content


Title: On Uniform Exponential Ergodicity of Markovian Multiclass Many-Server Queues in the Halfin–Whitt Regime
We study ergodic properties of Markovian multiclass many-server queues that are uniform over scheduling policies and the size of the system. The system is heavily loaded in the Halfin–Whitt regime, and the scheduling policies are work conserving and preemptive. We provide a unified approach via a Lyapunov function method that establishes Foster–Lyapunov equations for both the limiting diffusion and the prelimit diffusion-scaled queuing processes simultaneously. We first study the limiting controlled diffusion and show that if the spare capacity (safety staffing) parameter is positive, the diffusion is exponentially ergodic uniformly over all stationary Markov controls, and the invariant probability measures have uniform exponential tails. This result is sharp because when there is no abandonment and the spare capacity parameter is negative, the controlled diffusion is transient under any Markov control. In addition, we show that if all the abandonment rates are positive, the invariant probability measures have sub-Gaussian tails regardless whether the spare capacity parameter is positive or negative. Using these results, we proceed to establish the corresponding ergodic properties for the diffusion-scaled queuing processes. In addition to providing a simpler proof of previous results in Gamarnik and Stolyar [Gamarnik D, Stolyar AL (2012) Multiclass multiserver queueing system in the Halfin-Whitt heavy traffic regime: asymptotics of the stationary distribution. Queueing Systems 71(1–2):25–51], we extend these results to multiclass models with renewal arrival processes, albeit under the assumption that the mean residual life functions are bounded. For the Markovian model with Poisson arrivals, we obtain stronger results and show that the convergence to the stationary distribution is at an exponential rate uniformly over all work-conserving stationary Markov scheduling policies.  more » « less
Award ID(s):
1715875
NSF-PAR ID:
10282622
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Mathematics of Operations Research
Volume:
46
Issue:
2
ISSN:
0364-765X
Page Range / eLocation ID:
772 to 796
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the ergodic properties of a class of controlled stochastic differential equations (SDEs) driven by a-stable processes which arise as the limiting equations of multiclass queueing models in the Halfin–Whitt regime that have heavy–tailed arrival processes. When the safety staffing parameter is positive, we show that the SDEs are uniformly ergodic and enjoy a polynomial rate of convergence to the invariant probability measure in total variation, which is uniform over all stationary Markov controls resulting in a locally Lipschitz continuous drift. We also derive a matching lower bound on the rate of convergence (under no abandonment). On the other hand, when all abandonment rates are positive, we show that the SDEs are exponentially ergodic uniformly over the above-mentioned class of controls. Analogous results are obtained for Lévy–driven SDEs arising from multiclass many-server queues under asymptotically negligible service interruptions. For these equations, we show that the aforementioned ergodic properties are uniform over all stationary Markov controls. We also extend a key functional central limit theorem concerning diffusion approximations so as to make it applicable to the models studied here. 
    more » « less
  2. The property of (quasi-)reversibility of Markov chains have led to elegant characterization of steady-state distribution for complex queueing networks, e.g. celebrated Jackson networks, BCMP (Baskett, Chandi, Muntz, Palacois) and Kelly theorem. In a nutshell, despite the complicated interaction, in the steady-state, the queues in such networks exhibit independence and subsequently lead to explicit calculations of distributional properties of the queuing network that may seem impossible at the outset. The model of stochastic processing network (cf. Harrison 2000) captures variety of dynamic resource allocation problems including the flow-level networks used for modeling bandwidth sharing in the Internet, switched networks (cf. Shah, Wischik 2006) for modeling packet scheduling in the Internet router and wireless medium access, and hybrid flow-packet networks for modeling job-and-packet level scheduling in data centers. Unlike before, an appropriate resource allocation or scheduling policy is required in such networks to achieve good performance. Given the complexity, asymptotic analytic approaches such as fluid model or Lyapunov-Foster criteria to establish positive-recurrence and heavy traffic or diffusion approximation to characterize the scaled steady-state distribution became method of choice. A remarkable progress has been made along these lines over the past few decades, but there is a need for much more to match the explicit calculations in the context of reversible networks. In this work, we will present an alternative to this approach that leads to non-asymptotic, explicit characterization of steady-state distribution akin BCMP / Kelly theorems. This involves (a) identifying a "relaxation" of the given stochastic processing network in terms of an appropriate (quasi-)reversible queueing network, and (b) finding a resource allocation or scheduling policy of interest that "emulates" the "relaxed" networks within "small error". The proof is in the puddling -- we will present three examples of this program: (i) distributed scheduling in wireless network, (ii) scheduling in switched networks, and (iii) flow-packet scheduling in a data center. The notion of "baseline performance" (cf. Harrison, Mandayam, Shah, Yang 2014) will naturally emerges as a consequence of this program. We will discuss open questions pertaining multi-hop networks and computation complexity. 
    more » « less
  3. Problem definition: We study scheduling multi-class impatient customers in parallel server queueing systems. At the time of arrival, customers are identified as being one of many classes, and the class represents the service and patience time distributions as well as cost characteristics. From the system’s perspective, customers of the same class at time of arrival get differentiated on their residual patience time as they wait in queue. We leverage this property and propose two novel and easy-to-implement multi-class scheduling policies. Academic/practical relevance: Scheduling multi-class impatient customers is an important and challenging topic, especially when customers’ patience times are nonexponential. In these contexts, even for customers of the same class, processing them under the first-come, first-served (FCFS) policy is suboptimal. This is because, at time of arrival, the system only knows the overall patience distribution from which a customer’s patience value is drawn, and as time elapses, the estimate of the customer’s residual patience time can be further updated. For nonexponential patience distributions, such an update indeed reveals additional information, and using this information to implement within-class prioritization can lead to additional benefits relative to the FCFS policy. Methodology: We use fluid approximations to analyze the multi-class scheduling problem with ideas borrowed from convex optimization. These approximations are known to perform well for large systems, and we use simulations to validate our proposed policies for small systems. Results: We propose a multi-class time-in-queue policy that prioritizes both across customer classes and within each class using a simple rule and further show that most of the gains of such a policy can be achieved by deviating from within-class FCFS for at most one customer class. In addition, for systems with exponential patience times, our policy reduces to a simple priority-based policy, which we prove is asymptotically optimal for Markovian systems with an optimality gap that does not grow with system scale. Managerial implications: Our work provides managers ways of improving quality of service to manage parallel server queueing systems. We propose easy-to-implement policies that perform well relative to reasonable benchmarks. Our work also adds to the academic literature on multi-class queueing systems by demonstrating the joint benefits of cross- and within-class prioritization.

    Funding: A. Bassamboo received financial support from the National Science Foundation [Grant CMMI 2006350]. C. (A.) Wu received financial support from the Hong Kong General Research Fund [Early Career Scheme, Project 26206419].

    Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1190 .

     
    more » « less
  4. null (Ed.)
    Abstract We study ergodic properties of a class of Markov-modulated general birth–death processes under fast regime switching. The first set of results concerns the ergodic properties of the properly scaled joint Markov process with a parameter that is taken to be large. Under very weak hypotheses, we show that if the averaged process is exponentially ergodic for large values of the parameter, then the same applies to the original joint Markov process. The second set of results concerns steady-state diffusion approximations, under the assumption that the ‘averaged’ fluid limit exists. Here, we establish convergence rates for the moments of the approximating diffusion process to those of the Markov-modulated birth–death process. This is accomplished by comparing the generator of the approximating diffusion and that of the joint Markov process. We also provide several examples which demonstrate how the theory can be applied. 
    more » « less
  5. Off-policy evaluation (OPE) in reinforcement learning is notoriously difficult in long- and infinite-horizon settings due to diminishing overlap between behavior and target policies. In this paper, we study the role of Markovian and time-invariant structure in efficient OPE. We first derive the efficiency bounds and efficient influence functions for OPE when one assumes each of these structures. This precisely characterizes the curse of horizon: in time-variant processes, OPE is only feasible in the near-on-policy setting, where behavior and target policies are sufficiently similar. But, in time-invariant Markov decision processes, our bounds show that truly off-policy evaluation is feasible, even with only just one dependent trajectory, and provide the limits of how well we could hope to do. We develop a new estimator based on double reinforcement learning (DRL) that leverages this structure for OPE. Our DRL estimator simultaneously uses estimated stationary density ratios and q-functions and remains efficient when both are estimated at slow, nonparametric rates and remains consistent when either is estimated consistently. We investigate these properties and the performance benefits of leveraging the problem structure for more efficient OPE. 
    more » « less