skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: STOCHASTIC SETUP-COST INVENTORY MODEL WITH BACKORDERS AND QUASICONVEX COST FUNCTIONS
This paper studies a periodic-review single-commodity setup-cost inventory model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. We show that the Markov decision process for this inventory model satisfies the assumptions that lead to the validity of optimality equations for discounted and average-cost problems and to the existence of optimal (s,S) policies. In particular, we prove the equicontinuity of the family of discounted value functions and the convergence of optimal discounted lower thresholds to the optimal average-cost lower threshold for some sequence of discount factors converging to 1. If an arbitrary nonnegative amount of inventory can be ordered, we establish stronger convergence properties: (i) the optimal discounted lower thresholds converge to optimal average-cost lower threshold; and (ii) the discounted relative value functions converge to average-cost relative value function. These convergence results previously were known only for subsequences of discount factors even for problems with convex holding/backlog costs. The results of this paper also hold for problems with fixed lead times.  more » « less
Award ID(s):
1636193
PAR ID:
10107957
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Probability in the Engineering and Informational Sciences
ISSN:
0269-9648
Page Range / eLocation ID:
1 to 40
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In robust Markov decision processes (MDPs), the uncertainty in the transition kernel is addressed by finding a policy that optimizes the worst-case performance over an uncertainty set of MDPs. While much of the literature has focused on discounted MDPs, robust average-reward MDPs remain largely unexplored. In this paper, we focus on robust average-reward MDPs, where the goal is to find a policy that optimizes the worst-case average reward over an uncertainty set. We first take an approach that approximates average-reward MDPs using discounted MDPs. We prove that the robust discounted value function converges to the robust average-reward as the discount factor goes to 1, and moreover when it is large, any optimal policy of the robust discounted MDP is also an optimal policy of the robust average-reward. We further design a robust dynamic programming approach, and theoretically characterize its convergence to the optimum. Then, we investigate robust average-reward MDPs directly without using discounted MDPs as an intermediate step. We derive the robust Bellman equation for robust average-reward MDPs, prove that the optimal policy can be derived from its solution, and further design a robust relative value iteration algorithm that provably finds its solution, or equivalently, the optimal robust policy. 
    more » « less
  2. Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds can be arbitrarily large, as the discount factor approaches unity. These results seem to imply that a very large number of samples is required to achieve an epsilon-optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded over all discount factors. One may argue that this is impossible, due to a recent min-max lower bound. The explanation is that these prior bounds concern value function approximation and not policy approximation. We show that the asymptotic covariance of the tabular Q-learning algorithm with an optimized step-size sequence is a quadratic function of a factor that goes to infinity, as discount factor approaches 1; an essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic covariance that is uniformly bounded over all discount factors. 
    more » « less
  3. Problem definition: Inventory management problems with periodic and controllable resets occur in the context of managing water storage in the developing world and dynamically optimizing endcap promotion duration in retail outlets. In this paper, we consider a set of sequential decision problems in which the decision maker must not only balance holding and shortage costs but discard all inventory before a fixed number of decision epochs with the option for an early inventory reset. Methodology/results: Finding optimal policies for these problems through dynamic programming presents unique challenges because of the nonconvex nature of the resulting value functions. Moreover, this structure cannot be readily analyzed even with extended convexity definitions, such as K-convexity. Managerial implications: Our key contribution is to present sufficient conditions that ensure the optimal policy has an easily interpretable structure, which generalizes the well-known [Formula: see text] policy from the operations management literature. Furthermore, we demonstrate that, under these rather mild conditions, the optimal policy exhibits a four-threshold structure. We then conclude with computational experiments, thereby illustrating the policy structures that can be extracted in various inventory management scenarios. Funding: This work was supported by the National Science Foundation [Grant CMMI-1847666] and the Division of Graduate Education [Grant DGE-2125913]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.0318 . 
    more » « less
  4. We consider a multi-stage inventory system with stochastic demand and processing capacity constraints at each stage, for both finite-horizon and infinite-horizon, discounted-cost settings. For a class of such systems characterized by having the smallest capacity at the most downstream stage and system utilization above a certain threshold, we identify the structure of the optimal policy, which represents a novel variation of the order-up-to policy. We find the explicit functional form of the optimal order-up-to levels, and show that they depend (only) on upstream echelon inventories. We establish that, above the threshold utilization, this optimal policy achieves the decomposition of the multidimensional objective cost function for the system into a sum of single-dimensional convex functions. This decomposition eliminates the curse of dimensionality and allows us to numerically solve the problem. We provide a fast algorithm to determine a (tight) upper bound on this threshold utilization for capacity-constrained inventory problems with an arbitrary number of stages. We make use of this algorithm to quantify upper bounds on the threshold utilization for three-, four-, and five-stage capacitated systems over a range of model parameters, and discuss insights that emerge. 
    more » « less
  5. Robust Markov decision processes (MDPs) aim to find a policy that optimizes the worst-case performance over an uncertainty set of MDPs. Existing studies mostly have focused on the robust MDPs under the discounted reward criterion, leaving the ones under the average-reward criterion largely unexplored. In this paper, we develop the first comprehensive and systematic study of robust average-reward MDPs, where the goal is to optimize the long-term average performance under the worst case. Our contributions are four-folds: (1) we prove the uniform convergence of the robust discounted value function to the robust average-reward function as the discount factor γ goes to 1; (2) we derive the robust average-reward Bellman equation, characterize the structure of its solution set, and prove the equivalence between solving the robust Bellman equation and finding the optimal robust policy; (3) we design robust dynamic programming algorithms, and theoretically characterize their convergence to the optimal policy; and (4) we design two model-free algorithms unitizing the multi-level Monte-Carlo approach, and prove their asymptotic convergence 
    more » « less