We study the design of state dependent control for a closed queueing network model, inspired by shared transportation systems such as ridesharing. In particular, we focus on the design of assignment policies, wherein the platform can choose which supply unit to dispatch to meet an incoming customer request. The supply unit subsequently becomes available at the destination after dropping the customer. We consider the proportion of dropped demand in steady state as the performance measure. We propose a family of simple and explicit state dependent policies called Scaled MaxWeight (SMW) policies and prove that under the complete resource pooling (CRP) condition (analogous to a strict version of Hall's condition for bipartite matchings), any SMW policy induces an exponential decay of demand-dropping probability as the number of supply units scales to infinity. Furthermore, we show that there is an SMW policy that achieves the optimal exponent among all assignment policies, and analytically specify this policy in terms of the matrix of customer-request arrival rates. The optimal SMW policy protects structurally under-supplied locations.
more »
« less
Blind Dynamic Resource Allocation in Closed Networks via Mirror Backpressure
We study the problem of maximizing payoff generated over a period of time in a general class of closed queueing networks with a finite, fixed number of supply units that circulate in the system. Demand arrives stochastically, and serving a demand unit (customer) causes a supply unit to relocate from the “origin” to the “destination” of the customer. The key challenge is to manage the distribution of supply in the network. We consider general controls including customer entry control, pricing, and assignment. Motivating applications include shared transportation platforms and scrip systems. Inspired by the mirror descent algorithm for optimization and the backpressure policy for network control, we introduce a rich family of mirror backpressure (MBP) control policies. The MBP policies are simple and practical and crucially do not need any statistical knowledge of the demand (customer) arrival rates (these rates are permitted to vary in time). Under mild conditions, we propose MBP policies that are provably near optimal. Specifically, our policies lose at most [Formula: see text] payoff per customer relative to the optimal policy that knows the demand arrival rates, where K is the number of supply units, T is the total number of customers over the time horizon, and η is the demand process’ average rate of change per customer arrival. An adaptation of MBP is found to perform well in numerical experiments based on data from NYC Cab. This paper was accepted by Gabriel Weintraub, revenue management and market analytics. Funding: Y. Kanoria was supported by the National Science Foundation’s Division of Civil, Mechanical, and Manufacturing Innovation [Grant CMMI-1653477]. Supplemental Material: The data files and online appendices are available at https://doi.org/10.1287/mnsc.2023.4934 .
more »
« less
- Award ID(s):
- 1653477
- PAR ID:
- 10471807
- Publisher / Repository:
- INFORMS
- Date Published:
- Journal Name:
- Management Science
- ISSN:
- 0025-1909
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider a long-term average profit–maximizing admission control problem in an M/M/1 queuing system with unknown service and arrival rates. With a fixed reward collected upon service completion and a cost per unit of time enforced on customers waiting in the queue, a dispatcher decides upon arrivals whether to admit the arriving customer or not based on the full history of observations of the queue length of the system. Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24] shows that, if all the parameters of the model are known, then it is optimal to use a static threshold policy: admit if the queue length is less than a predetermined threshold and otherwise not. We propose a learning-based dispatching algorithm and characterize its regret with respect to optimal dispatch policies for the full-information model of Naor [Naor P (1969) The regulation of queue size by levying tolls. Econometrica 37(1):15–24]. We show that the algorithm achieves an O(1) regret when all optimal thresholds with full information are nonzero and achieves an [Formula: see text] regret for any specified [Formula: see text] in the case that an optimal threshold with full information is 0 (i.e., an optimal policy is to reject all arrivals), where N is the number of arrivals.more » « less
-
Motivated by a service platform, we study a two-sided network where heterogeneous demand (customers) and heterogeneous supply (workers) arrive randomly over time to get matched. Customers and workers arrive with a randomly sampled patience time (also known as reneging time in the literature) and are lost if forced to wait longer than that time to be matched. The system dynamics depend on the matching policy, which determines when to match a particular customer class with a particular worker class. Matches between classes use the head-of-line customer and worker from each class. Since customer and worker arrival processes can be very general counting processes, and the reneging times can be sampled from any finite mean distribution that is absolutely continuous, the state descriptor must track the age-in-system for every customer and worker waiting in order to be Markovian, as well as the time elapsed since the last arrival for every class. We develop a measure-valued fluid model that approximates the evolution of the discrete-event stochastic matching model and prove its solution is unique under a fixed matching policy. For a sequence of matching models, we establish a tightness result for the associated sequence of fluid-scaled state descriptors and show that any distributional limit point is a fluid model solution almost surely. When arrival rates are constant, we characterize the invariant states of the fluid model solution and show convergence to these invariant states as time becomes large. Finally, again when arrival rates are constant, we establish another tightness result for the sequence of fluid-scaled state descriptors distributed according to a stationary distribution and show that any subsequence converges to an invariant state. As a consequence, the fluid and time limits can be interchanged, which justifies regarding invariant states as first order approximations to stationary distributions.more » « less
-
We consider a distributed server system consisting of a large number of servers, each with limited capacity on multiple resources (CPU, memory, etc.). Jobs with different rewards arrive over time and require certain amounts of resources for the duration of their service. When a job arrives, the system must decide whether to admit it or reject it, and if admitted, in which server to schedule it. The objective is to maximize the expected total reward received by the system. This problem is motivated by control of cloud computing clusters, in which jobs are requests for virtual machines (VMs) or containers that reserve resources for various services, and rewards represent service priority of requests or price paid per time unit of service. We study this problem in an asymptotic regime where the number of servers and jobs’ arrival rates scale by a factor L, as L becomes large. We propose a resource reservation policy that asymptotically achieves at least 1/2, and under certain monotone property on jobs’ rewards and resources, at least [Formula: see text] of the optimal expected reward. The policy automatically scales the number of VM slots for each job type as the demand changes and decides in which servers the slots should be created in advance, without the knowledge of traffic rates.more » « less
-
We study the optimal control problem in stochastic queueing networks with a set of job dispatchers connected to a set of parallel servers with queues. Jobs arrive at the dispatchers and get routed to the servers following some routing policy. The arrival processes of jobs and the service processes of servers are stochastic with unknown arrival rates and service rates. Upon the completion of each job from dispatcherunat serversm, a random utility whose mean is unknown is obtained. We seek to design a control policy that makes routing decisions at the dispatchers and scheduling decisions at the servers to maximize the total utility obtained by the end of a finite time horizonT. The performance of policies is measured by regret, which is defined as the difference in total expected utility with respect to the optimal dynamic policy that has access to arrival rates, service rates and underlying utilities. We first show that the expected utility of the optimal dynamic policy is upper bounded byTtimes the solution to a static linear program, where the optimization variables correspond to rates of jobs from dispatchers to servers and the feasibility region is parameterized by arrival rates and service rates. We next propose a policy for the optimal control problem that is an integration of a learning algorithm and a control policy. The learning algorithm seeks to learn the optimal extreme point solution to the static linear program based on the information available in the optimal control problem. The control policy, a mixture of priority-based and Joint-the-Shortest-Queue routing at the dispatchers and priority-based scheduling at the servers, makes decisions based on the graphical structure induced by the extreme point solutions provided by the learning algorithm. We prove that our policy achieves logarithmic regret whereas application of existing techniques to the optimal control problem would lead to Ω(√T)-regret. The theoretical analysis is further complemented with simulations to evaluate the empirical performance of our policy.more » « less