This paper studies a dynamic pricing problem under model misspecification. To characterize model misspecification, we adopt the εcontamination model—the most fundamental model in robust statistics and machine learning. In particular, for a selling horizon of length T, the online εcontamination model assumes that demands are realized according to a typical unknown demand function only for [Formula: see text] periods. For the rest of [Formula: see text] periods, an outlier purchase can happen with arbitrary demand functions. The challenges brought by the presence of outlier customers are mainly due to the fact that arrivals of outliers and their exhibited demand behaviors are completely arbitrary, therefore calling for robust estimation and exploration strategies that can handle any outlier arrival and demand patterns. We first consider unconstrained dynamic pricing without any inventory constraint. In this case, we adopt the FollowtheRegularizedLeader algorithm to hedge against outlier purchase behavior. Then, we introduce inventory constraints. When the inventory is insufficient, we study a robust bisectionsearch algorithm to identify the clearance price—that is, the price at which the initial inventory is expected to clear at the end of T periods. Finally, we study the general dynamic pricing case, where a retailer has no clue whether the inventorymore »
Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information
We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and orderupto level for ordering strategy, and [Formula: see text], a function of onhand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving singleperiod optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profittogo function. The necessity of analyzing actiondependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than more »
 Award ID(s):
 2006526
 Publication Date:
 NSFPAR ID:
 10342728
 Journal Name:
 Management Science
 ISSN:
 00251909
 Sponsoring Org:
 National Science Foundation
More Like this


The prevalence of ecommerce has made customers’ detailed personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When using personalized information, the question of how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over T time periods with an unknown demand function of posted price and personalized information. At each time t, the retailer observes an arriving customer’s personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a thirdparty agent might infer the personalized information and purchase decisions from price changes in the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacypreserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customer’s information and purchasing decisions. To this end, we first introduce a notion of anticipating [Formula: see text]differential privacy that is tailored to the dynamic pricing problem. Our policy achieves both the privacy guarantee and the performancemore »

Pricebased revenue management is an important problem in operations management with many practical applications. The problem considers a seller who sells one or multiple products over T consecutive periods and is subject to constraints on the initial inventory levels of resources. Whereas, in theory, the optimal pricing policy could be obtained via dynamic programming, computing the exact dynamic programming solution is often intractable. Approximate policies, such as the resolving heuristics, are often applied as computationally tractable alternatives. In this paper, we show the following two results for pricebased network revenue management under a continuous price set. First, we prove that a natural resolving heuristic attains O(1) regret compared with the value of the optimal policy. This improves the [Formula: see text] regret upper bound established in the prior work by Jasin in 2014. Second, we prove that there is an [Formula: see text] gap between the value of the optimal policy and that of the fluid model. This complements our upper bound result by showing that the fluid is not an adequate informationrelaxed benchmark when analyzing pricebased revenue management algorithms. Funding: This work was supported in part by the National Science Foundation [Grant CMMI2145661].

In this work, we consider the popular treebased search strategy within the framework of reinforcement learning, the Monte Carlo tree search (MCTS), in the context of the infinitehorizon discounted cost Markov decision process (MDP). Although MCTS is believed to provide an approximate value function for a given state with enough simulations, the claimed proof of this property is incomplete. This is because the variant of MCTS, the upper confidence bound for trees (UCT), analyzed in prior works, uses “logarithmic” bonus term for balancing exploration and exploitation within the treebased search, following the insights from stochastic multiarm bandit (MAB) literature. In effect, such an approach assumes that the regret of the underlying recursively dependent nonstationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold, even for stationary MABs. As the key contribution of this work, we establish polynomial concentration property of regret for a class of nonstationary MABs. This in turn establishes that the MCTS with appropriate polynomial rather than logarithmic bonus term in UCB has a claimed property. Interestingly enough, empirically successful approaches use a similar polynomial form of MCTS as suggested by our result. Using this as a building block, we arguemore »

We consider a fundamental pricing model in which a fixed number of units of a reusable resource are used to serve customers. Customers arrive to the system according to a stochastic process and, upon arrival, decide whether to purchase the service, depending on their willingness to pay and the current price. The service time during which the resource is used by the customer is stochastic, and the firm may incur a service cost. This model represents various markets for reusable resources, such as cloud computing, shared vehicles, rotable parts, and hotel rooms. In the present paper, we analyze this pricing problem when the firm attempts to maximize a weighted combination of three central metrics: profit, market share, and service level. Under Poisson arrivals, exponential service times, and standard assumptions on the willingnesstopay distribution, we establish a series of results that characterize the performance of static pricing in such environments. In particular, although an optimal policy is fully dynamic in such a context, we prove that a static pricing policy simultaneously guarantees 78.9% of the profit, market share, and service level from the optimal policy. Notably, this result holds for any service rate and number of units the firm operates. Ourmore »