A Primal-Dual Approach to Constrained Markov Decision Processes with Applications to Queue Scheduling and Inventory Management

Chen, Yi; Dong, Jing; Wang, Zhaoran; Zhang, Chuheng

doi:10.1287/mnsc.2022.03736

In many operations management problems, we need to make decisions sequentially to minimize the cost, satisfying certain constraints. One modeling approach to such problems is the constrained Markov decision process (CMDP). In this work, we develop a data-driven primal-dual algorithm to solve CMDPs. Our approach alternatively applies regularized policy iteration to improve the policy and subgradient ascent to maintain the constraints. Under mild regularity conditions, we show that the algorithm converges at rate [Formula: see text], where T is the number of iterations, for both the discounted and long-run average cost formulations. Our algorithm can be easily combined with advanced deep learning techniques to deal with complex large-scale problems with the additional benefit of straightforward convergence analysis. When the CMDP has a weakly coupled structure, our approach can further reduce the computational complexity through an embedded decomposition. We apply the algorithm to two operations management problems: multiclass queue scheduling and multiproduct inventory management. Numerical experiments demonstrate that our algorithm, when combined with appropriate value function approximations, generates policies that achieve superior performance compared with state-of-the-art heuristics. This paper was accepted by Baris Ata, stochastic models and simulation. Funding: Y. Chen was supported by the Hong Kong Research Grants Council, Early Career Scheme Fund [Grant 26508924], and partially supported by a grant from the National Natural Science Foundation of China [Grant 72495125]. J. Dong was supported by the National Science Foundation [Grant 1944209]. Supplemental Material: The data files are available at https://doi.org/10.1287/mnsc.2022.03736 .

More Like this