skip to main content


Title: Cash‐Flow Based Dynamic Inventory Management

Small‐to‐medium‐sized enterprises (SMEs), including many startup firms, need to manage interrelated flows of cash and inventories of goods. In this study, we model a firm that can finance its inventory (ordered or manufactured) with loans in order to meet random demand which in general may not be time stationary. The firm earns interest on its cash on hand and pays interest on its debt. The objective is to maximize the expected value of the firm's capital at the end of a finite planning horizon. The firm's state at the beginning of each period is characterized by the inventory level and the capital level measured in units of the product, whose sum represents the “net worth” of the firm. Our study shows that the optimal ordering policy is characterized by a pair of threshold parameters as follows. (i) If the net worth is less than the lower threshold, then the firm employs a base stock order up to the lower threshold. (ii) If the net worth is between the two thresholds, then the firm orders exactly as many units as it can afford, without borrowing. (iii) If the net worth is above the upper threshold, then the firm employs a base stock order up to the upper threshold. Further, upper and lower bounds for the threshold values are developed using two simple‐to‐compute myopic ordering policies which yield lower bounds for the value function. We also derive an upper bound for the value function by considering a sell‐back policy. Subsequently, it is shown that policies of similar structure are optimal when the loan and deposit interest rates are piecewise linear functions, when there is a maximal loan limit and when unsatisfied demand is backordered. Finally, further managerial insights are provided with extensive numerical studies.

 
more » « less
NSF-PAR ID:
10247457
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
SAGE Publications
Date Published:
Journal Name:
Production and Operations Management
Volume:
25
Issue:
9
ISSN:
1059-1478
Format(s):
Medium: X Size: p. 1558-1575
Size(s):
["p. 1558-1575"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Reverse logistics has been gaining recognition in practice (and theory) for helping companies better match supply with demand, and thus reduce costs in their supply chains. In this paper, we study reverse logistics from the perspective of a supply chain in which each location can initiate multiple flows of product. Our first objective is to jointly optimize ordering decisions pertaining to regular, reverse and expedited flows of product in a stochastic, multi-stage inventory model of a logistics supply chain, in which the physical transformation of the product is completed at the most upstream location in the system. Due to those multiple flows of product, the feasible region for the problem acquires multi-dimensional boundaries that lead to the curse of dimensionality. To address this challenge, we develop a different solution method that allows us to reduce the dimensionality of the feasible region and, subsequently, identify the structure of the optimal policy. We refer to this policy as a nested echelon base-stock policy, as decisions for different product flows are sequentially nested within each other. We show that this policy renders the model analytically and numerically tractable. Our results provide actionable policies for firms to jointly manage three different product flows in their supply chains, and allow us arrive at insights regarding the main drivers of the value of reverse logistics. One of our key findings is that, when it comes to the value generated by reverse logistics, demand variability (i.e., demand uncertainty across periods) matters more than demand volatility (i.e., demand uncertainty within each period). This is because, in the absence of demand variability, it is effectively never optimal to return product upstream, regardless of the level of inherent demand volatility. Our second objective is to extend our analysis to product transforming-supply chains, in which product transformation is allowed to occur at each location. In such a system, it becomes necessary to keep track of both the location and stage of completion of each unit of inventory, so that the number of state and decisions variables increases with the square of the number of locations in the system. To analyze such a supply chain, we first identify a policy that provides a lower bound on the total cost. Then, we establish a special decomposition of the objective cost function that allows us to propose a novel heuristic policy. We find that the performance gap of our heuristic policy relative to the lower-bounding policy averages less than 5% across a range of parameters and supply chain lengths. 
    more » « less
  2. We consider a stochastic inventory control problem under censored demand, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near optimality of a simple class of policies called “base-stock policies” as well as the convexity of long-run average cost under those policies. We consider a relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound the regret of the algorithm when compared with the best base-stock policy. Our main contribution is a learning algorithm with a regret bound of [Formula: see text] for the inventory control problem. Here, [Formula: see text] is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the expected number of time steps needed to generate enough demand to deplete one unit of inventory. Notably, our regret bounds depend linearly on L, which significantly improves the previously best-known regret bounds for this problem where the dependence on L was exponential. Our techniques utilize the convexity of the long-run average cost and a newly derived bound on the “bias” of base-stock policies to establish an almost black box connection between the problem of learning in Markov decision processes (MDPs) with these properties and the stochastic convex bandit problem. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex asymptotic average cost functions. 
    more » « less
  3. In consulting, finance, and other service industries, customers represent a revenue stream, and must be acquired and retained over time. In this paper, we study the resource allocation problem of a profit maximizing service firm that dynamically allocates its resources toward acquiring new clients and retaining unsatisfied existing ones. The interaction between acquisition and retention in our model is reflected in the cash constraint on total expected spending on acquisition and retention in each period. We formulate this problem as a dynamic program in which the firm makes decisions in both acquisition and retention after observing the current size of its customer base and receiving information about customers in danger of attrition, and we characterize the structure of the optimal acquisition and retention strategy. We show that when the firm's customer base size is relatively low, the firm should spend heavily on acquisition and try to retain every unhappy customer. However, as its customer base grows, the firm should gradually shift its emphasis from acquisition to retention, and it should also aim to strike a balance between acquisition and retention while spending its available resources. Finally, when the customer base is large enough, it may be optimal for the firm to begin spending less in both acquisition and retention. We also extend our analysis to situations where acquisition or retention success rate, as a function of resources allocation, is uncertain and show that the optimal acquisition and retention policy can be surprisingly complex. However, we develop an effective heuristic for that case. This paper aims to provide service managers some analytical principles and effective guidelines on resource allocation between these two significant activities based on their firm's customer base size.

     
    more » « less
  4. null (Ed.)
    Learning to plan for long horizons is a central challenge in episodic reinforcement learning problems. A fundamental question is to understand how the difficulty of the problem scales as the horizon increases. Here the natural measure of sample complexity is a normalized one: we are interested in the \emph{number of episodes} it takes to provably discover a policy whose value is eps near to that of the optimal value, where the value is measured by the \emph{normalized} cumulative reward in each episode. In a COLT 2018 open problem, Jiang and Agarwal conjectured that, for tabular, episodic reinforcement learning problems, there exists a sample complexity lower bound which exhibits a polynomial dependence on the horizon --- a conjecture which is consistent with all known sample complexity upper bounds. This work refutes this conjecture, proving that tabular, episodic reinforcement learning is possible with a sample complexity that scales only \emph{logarithmically} with the planning horizon. In other words, when the values are appropriately normalized (to lie in the unit interval), this results shows that long horizon RL is no more difficult than short horizon RL, at least in a minimax sense. Our analysis introduces two ideas: (i) the construction of an eps-net for near-optimal policies whose log-covering number scales only logarithmically with the planning horizon, and (ii) the Online Trajectory Synthesis algorithm, which adaptively evaluates all policies in a given policy class and enjoys a sample complexity that scales logarithmically with the cardinality of the given policy class. Both may be of independent interest. 
    more » « less
  5. We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics. 
    more » « less