skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 20, 2026

Title: A Primal-Dual Approach to Constrained Markov Decision Processes with Applications to Queue Scheduling and Inventory Management
In many operations management problems, we need to make decisions sequentially to minimize the cost, satisfying certain constraints. One modeling approach to such problems is the constrained Markov decision process (CMDP). In this work, we develop a data-driven primal-dual algorithm to solve CMDPs. Our approach alternatively applies regularized policy iteration to improve the policy and subgradient ascent to maintain the constraints. Under mild regularity conditions, we show that the algorithm converges at rate [Formula: see text], where T is the number of iterations, for both the discounted and long-run average cost formulations. Our algorithm can be easily combined with advanced deep learning techniques to deal with complex large-scale problems with the additional benefit of straightforward convergence analysis. When the CMDP has a weakly coupled structure, our approach can further reduce the computational complexity through an embedded decomposition. We apply the algorithm to two operations management problems: multiclass queue scheduling and multiproduct inventory management. Numerical experiments demonstrate that our algorithm, when combined with appropriate value function approximations, generates policies that achieve superior performance compared with state-of-the-art heuristics. This paper was accepted by Baris Ata, stochastic models and simulation. Funding: Y. Chen was supported by the Hong Kong Research Grants Council, Early Career Scheme Fund [Grant 26508924], and partially supported by a grant from the National Natural Science Foundation of China [Grant 72495125]. J. Dong was supported by the National Science Foundation [Grant 1944209]. Supplemental Material: The data files are available at https://doi.org/10.1287/mnsc.2022.03736 .  more » « less
Award ID(s):
1944209
PAR ID:
10591738
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
INFORMS
Date Published:
Journal Name:
Management Science
ISSN:
0025-1909
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Problem definition: Inventory management problems with periodic and controllable resets occur in the context of managing water storage in the developing world and dynamically optimizing endcap promotion duration in retail outlets. In this paper, we consider a set of sequential decision problems in which the decision maker must not only balance holding and shortage costs but discard all inventory before a fixed number of decision epochs with the option for an early inventory reset. Methodology/results: Finding optimal policies for these problems through dynamic programming presents unique challenges because of the nonconvex nature of the resulting value functions. Moreover, this structure cannot be readily analyzed even with extended convexity definitions, such as K-convexity. Managerial implications: Our key contribution is to present sufficient conditions that ensure the optimal policy has an easily interpretable structure, which generalizes the well-known [Formula: see text] policy from the operations management literature. Furthermore, we demonstrate that, under these rather mild conditions, the optimal policy exhibits a four-threshold structure. We then conclude with computational experiments, thereby illustrating the policy structures that can be extracted in various inventory management scenarios. Funding: This work was supported by the National Science Foundation [Grant CMMI-1847666] and the Division of Graduate Education [Grant DGE-2125913]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0318 . 
    more » « less
  2. Price-based revenue management is an important problem in operations management with many practical applications. The problem considers a seller who sells one or multiple products over T consecutive periods and is subject to constraints on the initial inventory levels of resources. Whereas, in theory, the optimal pricing policy could be obtained via dynamic programming, computing the exact dynamic programming solution is often intractable. Approximate policies, such as the resolving heuristics, are often applied as computationally tractable alternatives. In this paper, we show the following two results for price-based network revenue management under a continuous price set. First, we prove that a natural resolving heuristic attains O(1) regret compared with the value of the optimal policy. This improves the [Formula: see text] regret upper bound established in the prior work by Jasin in 2014. Second, we prove that there is an [Formula: see text] gap between the value of the optimal policy and that of the fluid model. This complements our upper bound result by showing that the fluid is not an adequate information-relaxed benchmark when analyzing price-based revenue management algorithms. Funding: This work was supported in part by the National Science Foundation [Grant CMMI-2145661]. 
    more » « less
  3. Discretization-based approaches to solving online reinforcement learning problems are studied extensively on applications such as resource allocation and cache management. The two major questions in designing discretization-based algorithms are how to create the discretization and when to refine it. There are several experimental results investigating heuristic approaches to these questions but little theoretical treatment. In this paper, we provide a unified theoretical analysis of model-free and model-based, tree-based adaptive hierarchical partitioning methods for online reinforcement learning. We show how our algorithms take advantage of inherent problem structure by providing guarantees that scale with respect to the “zooming” instead of the ambient dimension, an instance-dependent quantity measuring the benignness of the optimal [Formula: see text] function. Many applications in computing systems and operations research require algorithms that compete on three facets: low sample complexity, mild storage requirements, and low computational burden for policy evaluation and training. Our algorithms are easily adapted to operating constraints, and our theory provides explicit bounds across each of the three facets. Funding: This work is supported by funding from the National Science Foundation [Grants ECCS-1847393, DMS-1839346, CCF-1948256, and CNS-1955997] and the Army Research Laboratory [Grant W911NF-17-1-0094]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2396 . 
    more » « less
  4. In this paper, we propose AdaBB, an adaptive gradient method based on the Barzilai-Borwein stepsize. The algorithm is line-search-free and parameter-free, and it essentially provides a convergent variant of the Barzilai-Borwein method for general convex optimization problems. We analyze the ergodic convergence of the objective function value and the convergence of the iterates for solving general convex optimization problems. Compared with existing works along this line of research, our algorithm gives the best lower bounds on the stepsize and the average of the stepsizes. Furthermore, we present extensions of the proposed algorithm for solving locally strongly convex and composite convex optimization problems where the objective function is the sum of a smooth function and a nonsmooth function. In the case of local strong convexity, we achieve linear convergence. Our numerical results also demonstrate very promising potential of the proposed algorithms on some representative examples. Funding: S. Ma is supported by the National Science Foundation [Grants DMS-2243650, CCF-2308597, CCF-2311275, and ECCS-2326591] and a startup fund from Rice University. J. Yang is supported by the National Natural Science Foundation of China [Grants 12431011 and 12371301] and the Natural Science Foundation for Distinguished Young Scholars of Gansu Province [Grant 22JR5RA223]. 
    more » « less
  5. We provide tools to analyze information design problems subject to constraints. We do so by extending an insight by Le Treust and Tomala to the case of multiple inequality and equality constraints. Namely, that an information design problem subject to constraints can be represented as an unconstrained information design problem with additional states, one for each constraint. Thus, without loss of generality, optimal solutions induce as many posteriors as the number of states and constraints. We provide results that refine this upper bound. Furthermore, we provide conditions under which there is no duality gap in constrained information design, thus validating a Lagrangian approach. We illustrate our results with applications to mechanism design with limited commitment and persuasion of a privately informed receiver. Funding: L. Doval acknowledges the support of the National Science Foundation through [Grant SES-2131706]. V. Skreta acknowledges the support from the National Science Foundation through [Grant SES-1851729] and from the European Research Council (ERC) through consolidator [Grant 682417]. Supplemental Material: The e-companion is available at https://doi.org/10.1287/moor.2022.1346 . 
    more » « less