Increased availability of high-quality customer information has fueled interest in personalized pricing strategies, that is, strategies that predict an individual customer’s valuation for a product and then offer a price tailored to that customer. Although the appeal of personalized pricing is clear, it may also incur large costs in the forms of market research, investment in information technology and analytics expertise, and branding risks. In light of these trade-offs, our work studies the value of personalized pricing strategies over a simple single-price strategy. We first provide closed-form lower and upper bounds on the ratio between the profits of an idealized personalized pricing strategy (first-degree price discrimination) and a single-price strategy. Our bounds depend on simple statistics of the valuation distribution and shed light on the types of markets for which personalized pricing has little or significant potential value. Second, we consider a feature-based pricing model where customer valuations can be estimated from observed features. We show how to transform our aforementioned bounds into lower and upper bounds on the value of feature-based pricing over single pricing depending on the degree to which the features are informative for the valuation. Finally, we demonstrate how to obtain sharper bounds by incorporating additional information about the valuation distribution (moments or shape constraints) by solving tractable linear optimization problems. This paper was accepted by David Simchi-Levi, revenue management and market analytics.
more »
« less
Distribution-Free Contextual Dynamic Pricing
Contextual dynamic pricing aims to set personalized prices based on sequential interactions with customers. At each time period, a customer who is interested in purchasing a product comes to the platform. The customer’s valuation for the product is a linear function of contexts, including product and customer features, plus some random market noise. The seller does not observe the customer’s true valuation, but instead needs to learn the valuation by leveraging contextual information and historic binary purchase feedback. Existing models typically assume full or partial knowledge of the random noise distribution. In this paper, we consider contextual dynamic pricing with unknown random noise in the linear valuation model. Our distribution-free pricing policy learns both the contextual function and the market noise simultaneously. A key ingredient of our method is a novel perturbed linear bandit framework, in which a modified linear upper confidence bound algorithm is proposed to balance the exploration of market noise and the exploitation of the current knowledge for better pricing. We establish the regret upper bound and a matching lower bound of our policy in the perturbed linear bandit framework and prove a sublinear regret bound in the considered pricing problem. Finally, we demonstrate the superior performance of our policy on simulations and a real-life auto loan data set. Funding: Y. Liu and W.W. Sun acknowledge support from the National Science Foundation Division of Social and Economic Sciences [Grant NSF-SES 2217440]. Supplemental Material: The supplementary material is available at https://doi.org/10.1287/moor.2023.1369 .
more »
« less
- Award ID(s):
- 2217440
- PAR ID:
- 10414473
- Date Published:
- Journal Name:
- Mathematics of Operations Research
- ISSN:
- 0364-765X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)The prevalence of e-commerce has made customers’ detailed personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When using personalized information, the question of how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over T time periods with an unknown demand function of posted price and personalized information. At each time t, the retailer observes an arriving customer’s personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a third-party agent might infer the personalized information and purchase decisions from price changes in the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacy-preserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customer’s information and purchasing decisions. To this end, we first introduce a notion of anticipating [Formula: see text]-differential privacy that is tailored to the dynamic pricing problem. Our policy achieves both the privacy guarantee and the performance guarantee in terms of regret. Roughly speaking, for d-dimensional personalized information, our algorithm achieves the expected regret at the order of [Formula: see text] when the customers’ information is adversarially chosen. For stochastic personalized information, the regret bound can be further improved to [Formula: see text]. This paper was accepted by J. George Shanthikumar, big data analytics.more » « less
-
In this paper, we study the problem of optimal data collection for policy evaluation in linear bandits. In policy evaluation, we are given a \textit{target} policy and asked to estimate the expected reward it will obtain when executed in a multi-armed bandit environment. Our work is the first work that focuses on such an optimal data collection strategy for policy evaluation involving heteroscedastic reward noise in the linear bandit setting. We first formulate an optimal design for weighted least squares estimates in the heteroscedastic linear bandit setting with the knowledge of noise variances. This design minimizes the mean squared error (MSE) of the estimated value of the target policy and is termed the oracle design. Since the noise variance is typically unknown, we then introduce a novel algorithm, SPEED (\textbf{S}tructured \textbf{P}olicy \textbf{E}valuation \textbf{E}xperimental \textbf{D}esign), that tracks the oracle design and derive its regret with respect to the oracle design. We show that regret scales as 𝑂˜(𝑑3𝑛−3/2) and prove a matching lower bound of Ω(𝑑2𝑛−3/2) . Finally, we evaluate SPEED on a set of policy evaluation tasks and demonstrate that it achieves MSE comparable to an optimal oracle and much lower than simply running the target policy.more » « less
-
We propose a differentially private linear contextual bandit algorithm, via a tree-based mechanism to add Laplace or Gaussian noise to model parameters. Our key insight is that as the model converges during online update, the global sensitivity of its parameters shrinks over time (thus named dynamic global sensitivity). Compared with existing solutions, our dynamic global sensitivity analysis allows us to inject less noise to obtain $$(\epsilon, \delta)$$-differential privacy with added regret caused by noise injection in $$\tilde O(\log{T}\sqrt{T}/\epsilon)$$. We provide a rigorous theoretical analysis over the amount of noise added via dynamic global sensitivity and the corresponding upper regret bound of our proposed algorithm. Experimental results on both synthetic and real-world datasets confirmed the algorithm's advantage against existing solutions.more » « less
-
We study optimal pricing in a single server queue when the customers valuation of service depends on their waiting time. In particular, we consider a very general model, where the customer valuations are random and are sampled from a distribution that depends on the queue length. The goal of the service provider is to set dynamic state dependent prices in order to maximize its revenue, while also managing congestion. We model the problem as a Markov decision process and present structural results on the optimal policy. We also present an algorithm to find an approximate optimal policy. We further present a myopic policy that is easy to evaluate and present bounds on its performance. We finally illustrate the quality of our approximate solution and the myopic solution using numerical simulations.more » « less
An official website of the United States government

