This paper studies a periodicreview singlecommodity setupcost inventory model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. We show that the Markov decision process for this inventory model satisfies the assumptions that lead to the validity of optimality equations for discounted and averagecost problems and to the existence of optimal (s,S) policies. In particular, we prove the equicontinuity of the family of discounted value functions and the convergence of optimal discounted lower thresholds to the optimal averagecost lower threshold for some sequence of discount factors converging to 1. If an arbitrary nonnegative amount of inventory can be ordered, we establish stronger convergence properties: (i) the optimal discounted lower thresholds converge to optimal averagecost lower threshold; and (ii) the discounted relative value functions converge to averagecost relative value function. These convergence results previously were known only for subsequences of discount factors even for problems with convex holding/backlog costs. The results of this paper also hold for problems with fixed lead times.
more »
« less
On the AverageCost Optimality Equations and Convergence of DiscountedCost Relative Value Functions for Inventory ControlProblems with Quasiconvex Cost Functions
 Award ID(s):
 1636193
 NSFPAR ID:
 10063441
 Date Published:
 Journal Name:
 Proceedings of the IEEE Conference on Decision & Control
 ISSN:
 07431546
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We consider a stochastic inventory control problem under censored demand, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near optimality of a simple class of policies called “basestock policies” as well as the convexity of longrun average cost under those policies. We consider a relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound the regret of the algorithm when compared with the best basestock policy. Our main contribution is a learning algorithm with a regret bound of [Formula: see text] for the inventory control problem. Here, [Formula: see text] is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the expected number of time steps needed to generate enough demand to deplete one unit of inventory. Notably, our regret bounds depend linearly on L, which significantly improves the previously bestknown regret bounds for this problem where the dependence on L was exponential. Our techniques utilize the convexity of the longrun average cost and a newly derived bound on the “bias” of basestock policies to establish an almost black box connection between the problem of learning in Markov decision processes (MDPs) with these properties and the stochastic convex bandit problem. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex asymptotic average cost functions.more » « less

A<sc>bstract</sc> The existence of barren plateaus has recently revealed new training challenges in quantum machine learning (QML). Uncovering the mechanisms behind barren plateaus is essential in understanding the scope of problems that QML can efficiently tackle. Barren plateaus have recently been shown to exist when learning global properties of random unitaries, which is relevant when learning black hole dynamics. Establishing whether local cost functions can circumvent these barren plateaus is pertinent if we hope to apply QML to quantum manybody systems. We prove a nogo theorem showing that local cost functions encounter barren plateaus in learning random unitary properties.