skip to main content

Title: On the Average-Cost Optimality Equations and Convergence of Discounted-Cost Relative Value Functions for Inventory ControlProblems with Quasiconvex Cost Functions
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the IEEE Conference on Decision & Control
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies a periodic-review single-commodity setup-cost inventory model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. We show that the Markov decision process for this inventory model satisfies the assumptions that lead to the validity of optimality equations for discounted and average-cost problems and to the existence of optimal (s,S) policies. In particular, we prove the equicontinuity of the family of discounted value functions and the convergence of optimal discounted lower thresholds to the optimal average-cost lower threshold for some sequence of discount factors converging to 1. If an arbitrary nonnegative amount of inventory can be ordered, we establish stronger convergence properties: (i) the optimal discounted lower thresholds converge to optimal average-cost lower threshold; and (ii) the discounted relative value functions converge to average-cost relative value function. These convergence results previously were known only for subsequences of discount factors even for problems with convex holding/backlog costs. The results of this paper also hold for problems with fixed lead times. 
    more » « less
  2. We consider a stochastic inventory control problem under censored demand, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near optimality of a simple class of policies called “base-stock policies” as well as the convexity of long-run average cost under those policies. We consider a relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound the regret of the algorithm when compared with the best base-stock policy. Our main contribution is a learning algorithm with a regret bound of [Formula: see text] for the inventory control problem. Here, [Formula: see text] is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the expected number of time steps needed to generate enough demand to deplete one unit of inventory. Notably, our regret bounds depend linearly on L, which significantly improves the previously best-known regret bounds for this problem where the dependence on L was exponential. Our techniques utilize the convexity of the long-run average cost and a newly derived bound on the “bias” of base-stock policies to establish an almost black box connection between the problem of learning in Markov decision processes (MDPs) with these properties and the stochastic convex bandit problem. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex asymptotic average cost functions. 
    more » « less
  3. A<sc>bstract</sc>

    The existence of barren plateaus has recently revealed new training challenges in quantum machine learning (QML). Uncovering the mechanisms behind barren plateaus is essential in understanding the scope of problems that QML can efficiently tackle. Barren plateaus have recently been shown to exist when learning global properties of random unitaries, which is relevant when learning black hole dynamics. Establishing whether local cost functions can circumvent these barren plateaus is pertinent if we hope to apply QML to quantum many-body systems. We prove a no-go theorem showing that local cost functions encounter barren plateaus in learning random unitary properties.

    more » « less