We consider a stochastic inventory control problem under censored demand, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near optimality of a simple class of policies called “base-stock policies” as well as the convexity of long-run average cost under those policies. We consider a relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound the regret of the algorithm when compared with the best base-stock policy. Our main contribution is a learning algorithm with a regret bound of [Formula: see text] for the inventory control problem. Here, [Formula: see text] is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the expected number of time steps needed to generate enough demand to deplete one unit of inventory. Notably, our regret bounds depend linearly on L, which significantly improves the previously best-known regret bounds for this problem where the dependence on L was exponential. Our techniques utilize the convexity of the long-run average cost and a newly derived bound on the “bias” of base-stock policies to establish an almost black box connection between the problem of learning in Markov decision processes (MDPs) with these properties and the stochastic convex bandit problem. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex asymptotic average cost functions.
more »
« less
Data‐driven inventory control involving fixed setup costs and discrete censored demand
Abstract We investigate a data‐driven dynamic inventory control problem involving fixed setup costs and lost sales. Random demand arrivals stem from a demand distribution that is only known to come out of a vast ambiguity set. Lost sales and demand ambiguity would together complicate the problem through censoring, namely, the inability of the firm to observe the lost portion of the demand data. Our main policy idea advocates periodically ordering up to high levels for learning purposes and, in intervening periods, cleverly exploiting the information gained in learning periods. By regret, we mean the price paid for ambiguity in long‐run average performances. When demand has a finite support, we can accomplish a regret bound in the order of which almost matches a known lower bound as long as inventory costs are genuinely convex. Major policy adjustments are warranted for the more complex case involving an unbounded demand support. The resulting regret could range between and depending on the nature of moment‐related bounds that help characterize the degree of ambiguity. These are improvable to when distributions are light‐tailed. Our simulation demonstrates the merits of various policy ideas.
more »
« less
- Award ID(s):
- 1662629
- PAR ID:
- 10669715
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Naval Research Logistics (NRL)
- Volume:
- 71
- Issue:
- 8
- ISSN:
- 0894-069X
- Page Range / eLocation ID:
- 1220 to 1236
- Subject(s) / Keyword(s):
- demand censoring, fixed setup costs, inventory, regret
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
When a new product has just been introduced or the economy has just entered a new phase, a firm is often at a loss as to what the underlying demand pattern has become let alone how best to respond to it. In “Dynamic Inventory Control with Fixed Setup Costs and Unknown Discrete Demand Distribution,” Davoodi, Katehakis, and Yang faced off this challenging problem by tailoring ordering decisions to empirical distributions formed out of past demand observations. In the presence of fixed setup costs, however, an (s,S) policy optimal in the conventional known-distribution setting would take many periods for its long-term benefit to be realized. Therefore, a good online policy has to balance between letting ordering decisions settle for long periods and adjusting them frequently to take advantage of newly available information. When properly balanced, such policies could indeed achieve tight bounds for the performance measure of regret.more » « less
-
Problem definition: We study a feature-based pricing problem with demand censoring in an offline, data-driven setting. In this problem, a firm is endowed with a finite amount of inventory and faces a random demand that is dependent on the offered price and the features (from products, customers, or both). Any unsatisfied demand that exceeds the inventory level is lost and unobservable. The firm does not know the demand function but has access to an offline data set consisting of quadruplets of historical features, inventory, price, and potentially censored sales quantity. Our objective is to use the offline data set to find the optimal feature-based pricing rule so as to maximize the expected profit. Methodology/results: Through the lens of causal inference, we propose a novel data-driven algorithm that is motivated by survival analysis and doubly robust estimation. We derive a finite sample regret bound to justify the proposed offline learning algorithm and prove its robustness. Numerical experiments demonstrate the robust performance of our proposed algorithm in accurately estimating optimal prices on both training and testing data. Managerial implications: The work provides practitioners with an innovative modeling and algorithmic framework for the feature-based pricing problem with demand censoring through the lens of causal inference. Our numerical experiments underscore the value of considering demand censoring in the context of feature-based pricing. Funding: The research of E. Fang is partially supported by the National Science Foundation [Grants NSF DMS-2346292, NSF DMS-2434666] and the Whitehead Scholarship. The research of C. Shi is partially supported by the Amazon Research Award. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2024.1061 .more » « less
-
We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.more » « less
-
Prior research on inventory control has been wide ranging, yet the environmental implications of an (s,S) inventory policy remain uninvestigated. This paper seeks to bridge the gap by characterising a firm’s voluntary environmental policy in the setup of an (s,S) inventory control policy. We suggest a mixed model structure wherein, due to the presence of fixed production costs, the inventory is determined continuously by sales and impulsively with ordering decisions obeying an optimal stopping process, while the uncertain sales process is controlled by continuous-time environmental goodwill-related decisions. We show that a firm should successively use voluntary environmental efforts to stimulate its sales when there is inventory and to increase backlogging to improve its production efficiency. Given the recurrent pattern of this policy, we conclude that voluntary environmental efforts under an (s,S) inventory control is not compatible with using these efforts as a means to generate ephemeral reputation insurance.more » « less
An official website of the United States government

