skip to main content


Title: Distribution-Free Contextual Dynamic Pricing
Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer’s valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer’s true valuation, but instead needs to learn the valuation by leveraging contextual information and historic binary purchase feedback. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the linear valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, in which a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sublinear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto loan data set. Funding: Y. Liu and W.W. Sun acknowledge support from the National Science Foundation Division of Social and Economic Sciences [Grant NSF-SES 2217440]. Supplemental Material: The supplementary material is available at https://doi.org/10.1287/moor.2023.1369 .  more » « less
Award ID(s):
2217440
NSF-PAR ID:
10414473
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Mathematics of Operations Research
ISSN:
0364-765X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Increased availability of high-quality customer information has fueled interest in personalized pricing strategies, that is, strategies that predict an individual customer’s valuation for a product and then offer a price tailored to that customer. Although the appeal of personalized pricing is clear, it may also incur large costs in the forms of market research, investment in information technology and analytics expertise, and branding risks. In light of these trade-offs, our work studies the value of personalized pricing strategies over a simple single-price strategy. We first provide closed-form lower and upper bounds on the ratio between the profits of an idealized personalized pricing strategy (first-degree price discrimination) and a single-price strategy. Our bounds depend on simple statistics of the valuation distribution and shed light on the types of markets for which personalized pricing has little or significant potential value. Second, we consider a feature-based pricing model where customer valuations can be estimated from observed features. We show how to transform our aforementioned bounds into lower and upper bounds on the value of feature-based pricing over single pricing depending on the degree to which the features are informative for the valuation. Finally, we demonstrate how to obtain sharper bounds by incorporating additional information about the valuation distribution (moments or shape constraints) by solving tractable linear optimization problems. This paper was accepted by David Simchi-Levi, revenue management and market analytics. 
    more » « less
  2. null (Ed.)
    The prevalence of e-commerce has made customers’ detailed personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When using personalized information, the question of how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over T time periods with an unknown demand function of posted price and personalized information. At each time t, the retailer observes an arriving customer’s personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a third-party agent might infer the personalized information and purchase decisions from price changes in the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacy-preserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customer’s information and purchasing decisions. To this end, we first introduce a notion of anticipating [Formula: see text]-differential privacy that is tailored to the dynamic pricing problem. Our policy achieves both the privacy guarantee and the performance guarantee in terms of regret. Roughly speaking, for d-dimensional personalized information, our algorithm achieves the expected regret at the order of [Formula: see text] when the customers’ information is adversarially chosen. For stochastic personalized information, the regret bound can be further improved to [Formula: see text]. This paper was accepted by J. George Shanthikumar, big data analytics. 
    more » « less
  3. We propose a differentially private linear contextual bandit algorithm, via a tree-based mechanism to add Laplace or Gaussian noise to model parameters. Our key insight is that as the model converges during online update, the global sensitivity of its parameters shrinks over time (thus named dynamic global sensitivity). Compared with existing solutions, our dynamic global sensitivity analysis allows us to inject less noise to obtain $(\epsilon, \delta)$-differential privacy with added regret caused by noise injection in $\tilde O(\log{T}\sqrt{T}/\epsilon)$. We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm. Experimental results on both synthetic and real-world datasets confirmed the algorithm's advantage against existing solutions. 
    more » « less
  4. Koyejo, S. ; Mohamed, S. ; Agarwal, A. ; Belgrave, D. ; Cho, K. ; Oh, A. (Ed.)
    In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the (ǫ, δ)-PAC setting: given a policy class Π the goal of the learner is to return a policy π ∈ Π whose expected reward is within ǫ of the optimal policy with probability greater than 1 − δ. We characterize the first instance-dependent PAC sample complexity of contextual bandits through a quantity ρΠ, and provide matching upper and lower bounds in terms of ρΠ for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle. 
    more » « less
  5. null (Ed.)
    We study the dynamic assortment planning problem, where for each arriving customer, the seller offers an assortment of substitutable products and the customer makes the purchase among offered products according to an uncapacitated multinomial logit (MNL) model. Because all the utility parameters of the MNL model are unknown, the seller needs to simultaneously learn customers’ choice behavior and make dynamic decisions on assortments based on the current knowledge. The goal of the seller is to maximize the expected revenue, or, equivalently, to minimize the expected regret. Although dynamic assortment planning problem has received an increasing attention in revenue management, most existing policies require the estimation of mean utility for each product and the final regret usually involves the number of products [Formula: see text]. The optimal regret of the dynamic assortment planning problem under the most basic and popular choice model—the MNL model—is still open. By carefully analyzing a revenue potential function, we develop a trisection-based policy combined with adaptive confidence bound construction, which achieves an item-independent regret bound of [Formula: see text], where [Formula: see text] is the length of selling horizon. We further establish the matching lower bound result to show the optimality of our policy. There are two major advantages of the proposed policy. First, the regret of all our policies has no dependence on [Formula: see text]. Second, our policies are almost assumption-free: there is no assumption on mean utility nor any “separability” condition on the expected revenues for different assortments. We also extend our trisection search algorithm to capacitated MNL models and obtain the optimal regret [Formula: see text] (up to logrithmic factors) without any assumption on the mean utility parameters of items. 
    more » « less