skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information
We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.  more » « less
Award ID(s):
2006526
PAR ID:
10342728
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Management Science
ISSN:
0025-1909
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies a dynamic pricing problem under model misspecification. To characterize model misspecification, we adopt the ε-contamination model—the most fundamental model in robust statistics and machine learning. In particular, for a selling horizon of length T, the online ε-contamination model assumes that demands are realized according to a typical unknown demand function only for [Formula: see text] periods. For the rest of [Formula: see text] periods, an outlier purchase can happen with arbitrary demand functions. The challenges brought by the presence of outlier customers are mainly due to the fact that arrivals of outliers and their exhibited demand behaviors are completely arbitrary, therefore calling for robust estimation and exploration strategies that can handle any outlier arrival and demand patterns. We first consider unconstrained dynamic pricing without any inventory constraint. In this case, we adopt the Follow-the-Regularized-Leader algorithm to hedge against outlier purchase behavior. Then, we introduce inventory constraints. When the inventory is insufficient, we study a robust bisection-search algorithm to identify the clearance price—that is, the price at which the initial inventory is expected to clear at the end of T periods. Finally, we study the general dynamic pricing case, where a retailer has no clue whether the inventory is sufficient or not. In this case, we design a meta-algorithm that combines the previous two policies. All algorithms are fully adaptive, without requiring prior knowledge of the outlier proportion parameter ε. Simulation study shows that our policy outperforms existing policies in the literature. 
    more » « less
  2. Problem definition: We study a feature-based pricing problem with demand censoring in an offline, data-driven setting. In this problem, a firm is endowed with a finite amount of inventory and faces a random demand that is dependent on the offered price and the features (from products, customers, or both). Any unsatisfied demand that exceeds the inventory level is lost and unobservable. The firm does not know the demand function but has access to an offline data set consisting of quadruplets of historical features, inventory, price, and potentially censored sales quantity. Our objective is to use the offline data set to find the optimal feature-based pricing rule so as to maximize the expected profit. Methodology/results: Through the lens of causal inference, we propose a novel data-driven algorithm that is motivated by survival analysis and doubly robust estimation. We derive a finite sample regret bound to justify the proposed offline learning algorithm and prove its robustness. Numerical experiments demonstrate the robust performance of our proposed algorithm in accurately estimating optimal prices on both training and testing data. Managerial implications: The work provides practitioners with an innovative modeling and algorithmic framework for the feature-based pricing problem with demand censoring through the lens of causal inference. Our numerical experiments underscore the value of considering demand censoring in the context of feature-based pricing. Funding: The research of E. Fang is partially supported by the National Science Foundation [Grants NSF DMS-2346292, NSF DMS-2434666] and the Whitehead Scholarship. The research of C. Shi is partially supported by the Amazon Research Award. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2024.1061 . 
    more » « less
  3. null (Ed.)
    The prevalence of e-commerce has made customers’ detailed personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When using personalized information, the question of how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over T time periods with an unknown demand function of posted price and personalized information. At each time t, the retailer observes an arriving customer’s personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a third-party agent might infer the personalized information and purchase decisions from price changes in the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacy-preserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customer’s information and purchasing decisions. To this end, we first introduce a notion of anticipating [Formula: see text]-differential privacy that is tailored to the dynamic pricing problem. Our policy achieves both the privacy guarantee and the performance guarantee in terms of regret. Roughly speaking, for d-dimensional personalized information, our algorithm achieves the expected regret at the order of [Formula: see text] when the customers’ information is adversarially chosen. For stochastic personalized information, the regret bound can be further improved to [Formula: see text]. This paper was accepted by J. George Shanthikumar, big data analytics. 
    more » « less
  4. Price-based revenue management is an important problem in operations management with many practical applications. The problem considers a seller who sells one or multiple products over T consecutive periods and is subject to constraints on the initial inventory levels of resources. Whereas, in theory, the optimal pricing policy could be obtained via dynamic programming, computing the exact dynamic programming solution is often intractable. Approximate policies, such as the resolving heuristics, are often applied as computationally tractable alternatives. In this paper, we show the following two results for price-based network revenue management under a continuous price set. First, we prove that a natural resolving heuristic attains O(1) regret compared with the value of the optimal policy. This improves the [Formula: see text] regret upper bound established in the prior work by Jasin in 2014. Second, we prove that there is an [Formula: see text] gap between the value of the optimal policy and that of the fluid model. This complements our upper bound result by showing that the fluid is not an adequate information-relaxed benchmark when analyzing price-based revenue management algorithms. Funding: This work was supported in part by the National Science Foundation [Grant CMMI-2145661]. 
    more » « less
  5. Guruswami, Venkatesan (Ed.)
    In decentralized finance ("DeFi"), automated market makers (AMMs) enable traders to programmatically exchange one asset for another. Such trades are enabled by the assets deposited by liquidity providers (LPs). The goal of this paper is to characterize and interpret the optimal (i.e., profit-maximizing) strategy of a monopolist liquidity provider, as a function of that LP’s beliefs about asset prices and trader behavior. We introduce a general framework for reasoning about AMMs based on a Bayesian-like belief inference framework, where LPs maintain an asset price estimate, which is updated by incorporating traders' price estimates. In this model, the market maker (i.e., LP) chooses a demand curve that specifies the quantity of a risky asset to be held at each dollar price. Traders arrive sequentially and submit a price bid that can be interpreted as their estimate of the risky asset price; the AMM responds to this submitted bid with an allocation of the risky asset to the trader, a payment that the trader must pay, and a revised internal estimate for the true asset price. We define an incentive-compatible (IC) AMM as one in which a trader’s optimal strategy is to submit its true estimate of the asset price, and characterize the IC AMMs as those with downward-sloping demand curves and payments defined by a formula familiar from Myerson’s optimal auction theory. We generalize Myerson’s virtual values, and characterize the profit-maximizing IC AMM. The optimal demand curve generally has a jump that can be interpreted as a "bid-ask spread," which we show is caused by a combination of adverse selection risk (dominant when the degree of information asymmetry is large) and monopoly pricing (dominant when asymmetry is small). This work opens up new research directions into the study of automated exchange mechanisms from the lens of optimal auction theory and iterative belief inference, using tools of theoretical computer science in a novel way. 
    more » « less