Service systems are typically limited resource environments where scarce capacity is reserved for the most urgent customers. However, there has been a growing interest in the use of proactive service when a less urgent customer may become urgent while waiting. On one hand, providing service for customers when they are less urgent could mean that fewer resources are needed to fulfill their service requirement. On the other hand, using limited capacity for customers who may never need the service in the future takes the capacity away from other more urgent customers who need it now. To understand this tension, we propose a multiserver queueing model with two customer classes: moderate and urgent. We allow customers to transition classes while waiting. In this setting, we characterize how moderate and urgent customers should be prioritized for service when proactive service for moderate customers is an option. We identify an index, the modified [Formula: see text]-index, which plays an important role in determining the optimal scheduling policy. This index lends itself to an intuitive interpretation of how to balance holding costs, service times, abandonments, and transitions between customer classes. This paper was accepted by David Simchi-Levi, stochastic models and simulation.
This content will become publicly available on July 6, 2023
Managing Queues with Different Resource Requirements
Queueing models that are used to capture various service settings typically assume that customers require a single unit of resource (server) to be processed. However, there are many service settings where such an assumption may fail to capture the heterogeneity in resource requirements of different customers. We propose a multiserver queueing model with multiple customer classes in which customers from different classes may require different amounts of resources to be served. We study the optimal scheduling policy for such systems. To balance holding costs, service rates, resource requirement, and priority-induced idleness, we develop an index-based policy that we refer to as the idle-avoid [Formula: see text] rule. For a two-class two-server model, where policy-induced idleness can have a big impact on system performance, we characterize cases where the idle-avoid [Formula: see text] rule is optimal. In other cases, we establish a uniform performance bound on the amount of suboptimality incurred by the idle-avoid [Formula: see text] rule. For general multiclass multiserver queues, we establish the asymptotic optimality of the idle-avoid [Formula: see text] rule in the many-server regime. For long-time horizons, we show that the idle-avoid [Formula: see text] is throughput optimal. Our theoretical results, along with numerical experiments, provide more »
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Operations Research
- Sponsoring Org:
- National Science Foundation
More Like this
Multiserver-job systems, where jobs require concurrent service at many servers, occur widely in practice. Essentially all of the theoretical work on multiserver-job systems focuses on maximizing utilization, with almost nothing known about mean response time. In simpler settings, such as various known-size single-server-job settings, minimizing mean response time is merely a matter of prioritizing small jobs. However, for the multiserver-job system, prioritizing small jobs is not enough, because we must also ensure servers are not unnecessarily left idle. Thus, minimizing mean response time requires prioritizing small jobs while simultaneously maximizing throughput. Our question is how to achieve these joint objectives. We devise the ServerFilling-SRPT scheduling policy, which is the first policy to minimize mean response time in the multiserver-job model in the heavy traffic limit. In addition to proving this heavy-traffic result, we present empirical evidence that ServerFilling-SRPT outperforms all existing scheduling policies for all loads, with improvements by orders of magnitude at higher loads. Because ServerFilling-SRPT requires knowing job sizes, we also define the ServerFilling-Gittins policy, which is optimal when sizes are unknown or partially known.
We consider a distributed server system consisting of a large number of servers, each with limited capacity on multiple resources (CPU, memory, etc.). Jobs with different rewards arrive over time and require certain amounts of resources for the duration of their service. When a job arrives, the system must decide whether to admit it or reject it, and if admitted, in which server to schedule it. The objective is to maximize the expected total reward received by the system. This problem is motivated by control of cloud computing clusters, in which jobs are requests for virtual machines (VMs) or containers that reserve resources for various services, and rewards represent service priority of requests or price paid per time unit of service. We study this problem in an asymptotic regime where the number of servers and jobs’ arrival rates scale by a factor L, as L becomes large. We propose a resource reservation policy that asymptotically achieves at least 1/2, and under certain monotone property on jobs’ rewards and resources, at least [Formula: see text] of the optimal expected reward. The policy automatically scales the number of VM slots for each job type as the demand changes and decides in whichmore »
We study the dynamic assortment planning problem, where for each arriving customer, the seller offers an assortment of substitutable products and the customer makes the purchase among offered products according to an uncapacitated multinomial logit (MNL) model. Because all the utility parameters of the MNL model are unknown, the seller needs to simultaneously learn customers’ choice behavior and make dynamic decisions on assortments based on the current knowledge. The goal of the seller is to maximize the expected revenue, or, equivalently, to minimize the expected regret. Although dynamic assortment planning problem has received an increasing attention in revenue management, most existing policies require the estimation of mean utility for each product and the final regret usually involves the number of products [Formula: see text]. The optimal regret of the dynamic assortment planning problem under the most basic and popular choice model—the MNL model—is still open. By carefully analyzing a revenue potential function, we develop a trisection-based policy combined with adaptive confidence bound construction, which achieves an item-independent regret bound of [Formula: see text], where [Formula: see text] is the length of selling horizon. We further establish the matching lower bound result to show the optimality of our policy. There aremore »
The shortest-remaining-processing-time (SRPT) scheduling policy has been extensively studied, for more than 50 years, in single-server queues with infinitely patient jobs. Yet, much less is known about its performance in multiserver queues. In this paper, we present the first theoretical analysis of SRPT in multiserver queues with abandonment. In particular, we consider the M/GI/s+GI queue and demonstrate that, in the many-sever overloaded regime, performance in the SRPT queue is equivalent, asymptotically in steady state, to a preemptive two-class priority queue where customers with short service times (below a threshold) are served without wait, and customers with long service times (above a threshold) eventually abandon without service. We prove that the SRPT discipline maximizes, asymptotically, the system throughput, among all scheduling disciplines. We also compare the performance of the SRPT policy to blind policies and study the effects of the patience-time and service-time distributions. This paper was accepted by Baris Ata, stochastic models & simulation.