Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors of estimated Q-values and value functions under various RL settings. In particular, we explicitly identify closed-form expressions of the asymptotic variances, which allow us to efficiently construct asymptotically valid confidence regions for key RL quantities. Furthermore, we utilize these asymptotic expressions to design an effective exploration strategy, which we call Q-value-based Optimal Computing Budget Allocation (Q-OCBA). The policy relies on maximizing the relative discrepancies among the Q-value estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark policies. Funding: This work was supported by the National Science Foundation (1720433).more » « lessFree, publicly-accessible full text available March 2, 2024
-
Free, publicly-accessible full text available December 1, 2023
-
Free, publicly-accessible full text available January 1, 2024
-
Free, publicly-accessible full text available January 1, 2024
-
Queueing models that are used to capture various service settings typically assume that customers require a single unit of resource (server) to be processed. However, there are many service settings where such an assumption may fail to capture the heterogeneity in resource requirements of different customers. We propose a multiserver queueing model with multiple customer classes in which customers from different classes may require different amounts of resources to be served. We study the optimal scheduling policy for such systems. To balance holding costs, service rates, resource requirement, and priority-induced idleness, we develop an index-based policy that we refer to as the idle-avoid [Formula: see text] rule. For a two-class two-server model, where policy-induced idleness can have a big impact on system performance, we characterize cases where the idle-avoid [Formula: see text] rule is optimal. In other cases, we establish a uniform performance bound on the amount of suboptimality incurred by the idle-avoid [Formula: see text] rule. For general multiclass multiserver queues, we establish the asymptotic optimality of the idle-avoid [Formula: see text] rule in the many-server regime. For long-time horizons, we show that the idle-avoid [Formula: see text] is throughput optimal. Our theoretical results, along with numerical experiments, provide support for the good and robust performance of the proposed policy.more » « less
-
Service systems are typically limited resource environments where scarce capacity is reserved for the most urgent customers. However, there has been a growing interest in the use of proactive service when a less urgent customer may become urgent while waiting. On one hand, providing service for customers when they are less urgent could mean that fewer resources are needed to fulfill their service requirement. On the other hand, using limited capacity for customers who may never need the service in the future takes the capacity away from other more urgent customers who need it now. To understand this tension, we propose a multiserver queueing model with two customer classes: moderate and urgent. We allow customers to transition classes while waiting. In this setting, we characterize how moderate and urgent customers should be prioritized for service when proactive service for moderate customers is an option. We identify an index, the modified [Formula: see text]-index, which plays an important role in determining the optimal scheduling policy. This index lends itself to an intuitive interpretation of how to balance holding costs, service times, abandonments, and transitions between customer classes. This paper was accepted by David Simchi-Levi, stochastic models and simulation.more » « less
-
The shortest-remaining-processing-time (SRPT) scheduling policy has been extensively studied, for more than 50 years, in single-server queues with infinitely patient jobs. Yet, much less is known about its performance in multiserver queues. In this paper, we present the first theoretical analysis of SRPT in multiserver queues with abandonment. In particular, we consider the M/GI/s+GI queue and demonstrate that, in the many-sever overloaded regime, performance in the SRPT queue is equivalent, asymptotically in steady state, to a preemptive two-class priority queue where customers with short service times (below a threshold) are served without wait, and customers with long service times (above a threshold) eventually abandon without service. We prove that the SRPT discipline maximizes, asymptotically, the system throughput, among all scheduling disciplines. We also compare the performance of the SRPT policy to blind policies and study the effects of the patience-time and service-time distributions. This paper was accepted by Baris Ata, stochastic models & simulation.more » « less
-
null (Ed.)Consider a fractional Brownian motion (fBM) [Formula: see text] with Hurst index [Formula: see text]. We construct a probability space supporting both B H and a fully simulatable process [Formula: see text] such that[Formula: see text] with probability one for any user-specified error bound [Formula: see text]. When [Formula: see text], we further enhance our error guarantee to the α-Hölder norm for any [Formula: see text]. This enables us to extend our algorithm to the simulation of fBM-driven stochastic differential equations [Formula: see text]. Under mild regularity conditions on the drift and diffusion coefficients of Y, we construct a probability space supporting both Y and a fully simulatable process [Formula: see text] such that[Formula: see text] with probability one. Our algorithms enjoy the tolerance-enforcement feature, under which the error bounds can be updated sequentially in an efficient way. Thus, the algorithms can be readily combined with other advanced simulation techniques to estimate the expectations of functionals of fBMs efficiently.more » « less