skip to main content


Title: Fluid Limits for Multiclass Many-Server Queues with General Reneging Distributions and Head-of-the-Line Scheduling
We describe a fluid model with time-varying input that approximates a multiclass many-server queue with general reneging distribution and multiple customer classes (specifically, the multiclass G/GI/N+GI queue). The system dynamics depend on the policy, which is a rule for determining when to serve a given customer class. The class of admissible control policies are those that are head-of-the-line (HL) and nonanticipating. For a sequence of many-server queues operating under admissible HL control policies and satisfying some mild asymptotic conditions, we establish a tightness result for the sequence of fluid scaled queue state descriptors and associated processes and show that limit points of such sequences are fluid model solutions almost surely. The tightness result together with the characterization of distributional limit points as fluid model solutions almost surely provides a foundation for the analysis of particular HL control policies of interest. We leverage these results to analyze a set of admissible HL control policies that we introduce, called weighted random buffer selection (WRBS), and an associated WRBS fluid model that allows multiple classes to be partially served in the fluid limit (which is in contrast to previously analyzed static priority policies).  more » « less
Award ID(s):
2054505 1712974
NSF-PAR ID:
10332786
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Mathematics of Operations Research
Volume:
47
Issue:
2
ISSN:
0364-765X
Page Range / eLocation ID:
1192 to 1228
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Problem definition: We study scheduling multi-class impatient customers in parallel server queueing systems. At the time of arrival, customers are identified as being one of many classes, and the class represents the service and patience time distributions as well as cost characteristics. From the system’s perspective, customers of the same class at time of arrival get differentiated on their residual patience time as they wait in queue. We leverage this property and propose two novel and easy-to-implement multi-class scheduling policies. Academic/practical relevance: Scheduling multi-class impatient customers is an important and challenging topic, especially when customers’ patience times are nonexponential. In these contexts, even for customers of the same class, processing them under the first-come, first-served (FCFS) policy is suboptimal. This is because, at time of arrival, the system only knows the overall patience distribution from which a customer’s patience value is drawn, and as time elapses, the estimate of the customer’s residual patience time can be further updated. For nonexponential patience distributions, such an update indeed reveals additional information, and using this information to implement within-class prioritization can lead to additional benefits relative to the FCFS policy. Methodology: We use fluid approximations to analyze the multi-class scheduling problem with ideas borrowed from convex optimization. These approximations are known to perform well for large systems, and we use simulations to validate our proposed policies for small systems. Results: We propose a multi-class time-in-queue policy that prioritizes both across customer classes and within each class using a simple rule and further show that most of the gains of such a policy can be achieved by deviating from within-class FCFS for at most one customer class. In addition, for systems with exponential patience times, our policy reduces to a simple priority-based policy, which we prove is asymptotically optimal for Markovian systems with an optimality gap that does not grow with system scale. Managerial implications: Our work provides managers ways of improving quality of service to manage parallel server queueing systems. We propose easy-to-implement policies that perform well relative to reasonable benchmarks. Our work also adds to the academic literature on multi-class queueing systems by demonstrating the joint benefits of cross- and within-class prioritization.

    Funding: A. Bassamboo received financial support from the National Science Foundation [Grant CMMI 2006350]. C. (A.) Wu received financial support from the Hong Kong General Research Fund [Early Career Scheme, Project 26206419].

    Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.1190 .

     
    more » « less
  2. Queueing models that are used to capture various service settings typically assume that customers require a single unit of resource (server) to be processed. However, there are many service settings where such an assumption may fail to capture the heterogeneity in resource requirements of different customers. We propose a multiserver queueing model with multiple customer classes in which customers from different classes may require different amounts of resources to be served. We study the optimal scheduling policy for such systems. To balance holding costs, service rates, resource requirement, and priority-induced idleness, we develop an index-based policy that we refer to as the idle-avoid [Formula: see text] rule. For a two-class two-server model, where policy-induced idleness can have a big impact on system performance, we characterize cases where the idle-avoid [Formula: see text] rule is optimal. In other cases, we establish a uniform performance bound on the amount of suboptimality incurred by the idle-avoid [Formula: see text] rule. For general multiclass multiserver queues, we establish the asymptotic optimality of the idle-avoid [Formula: see text] rule in the many-server regime. For long-time horizons, we show that the idle-avoid [Formula: see text] is throughput optimal. Our theoretical results, along with numerical experiments, provide support for the good and robust performance of the proposed policy. 
    more » « less
  3. Abstract Using a result of Blanchet and Wallwater (2015) for exactly simulating the maximum of a negative drift random walk queue endowed with independent and identically distributed (i.i.d.) increments, we extend it to a multi-dimensional setting and then we give a new algorithm for simulating exactly the stationary distribution of a first-in–first-out (FIFO) multi-server queue in which the arrival process is a general renewal process and the service times are i.i.d.: the FIFO GI/GI/ c queue with $ 2 \leq c \lt \infty$ . Our method utilizes dominated coupling from the past (DCFP) as well as the random assignment (RA) discipline, and complements the earlier work in which Poisson arrivals were assumed, such as the recent work of Connor and Kendall (2015). We also consider the models in continuous time, and show that with mild further assumptions, the exact simulation of those stationary distributions can also be achieved. We also give, using our FIFO algorithm, a new exact simulation algorithm for the stationary distribution of the infinite server case, the GI/GI/ $\infty$ model. Finally, we even show how to handle fork–join queues, in which each arriving customer brings c jobs, one for each server. 
    more » « less
  4. The shortest-remaining-processing-time (SRPT) scheduling policy has been extensively studied, for more than 50 years, in single-server queues with infinitely patient jobs. Yet, much less is known about its performance in multiserver queues. In this paper, we present the first theoretical analysis of SRPT in multiserver queues with abandonment. In particular, we consider the M/GI/s+GI queue and demonstrate that, in the many-sever overloaded regime, performance in the SRPT queue is equivalent, asymptotically in steady state, to a preemptive two-class priority queue where customers with short service times (below a threshold) are served without wait, and customers with long service times (above a threshold) eventually abandon without service. We prove that the SRPT discipline maximizes, asymptotically, the system throughput, among all scheduling disciplines. We also compare the performance of the SRPT policy to blind policies and study the effects of the patience-time and service-time distributions. This paper was accepted by Baris Ata, stochastic models & simulation. 
    more » « less
  5. null (Ed.)
    Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-server-per-job model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems. 
    more » « less