The prevalence of e-commerce has made customers’ detailed personal information readily accessible to retailers, and this information has been widely used in pricing decisions. When using personalized information, the question of how to protect the privacy of such information becomes a critical issue in practice. In this paper, we consider a dynamic pricing problem over T time periods with an unknown demand function of posted price and personalized information. At each time t, the retailer observes an arriving customer’s personal information and offers a price. The customer then makes the purchase decision, which will be utilized by the retailer to learn the underlying demand function. There is potentially a serious privacy concern during this process: a third-party agent might infer the personalized information and purchase decisions from price changes in the pricing system. Using the fundamental framework of differential privacy from computer science, we develop a privacy-preserving dynamic pricing policy, which tries to maximize the retailer revenue while avoiding information leakage of individual customer’s information and purchasing decisions. To this end, we first introduce a notion of anticipating [Formula: see text]-differential privacy that is tailored to the dynamic pricing problem. Our policy achieves both the privacy guarantee and the performancemore »
This content will become publicly available on August 1, 2023
Differential Privacy in Personalized Pricing with Nonparametric Demand Models
In recent decades, the advance of information technology and abundant personal data facilitate the application of algorithmic personalized pricing. However, this leads to the growing concern of potential violation of privacy because of adversarial attack. To address the privacy issue, this paper studies a dynamic personalized pricing problem with unknown nonparametric demand models under data privacy protection. Two concepts of data privacy, which have been widely applied in practices, are introduced: central differential privacy (CDP) and local differential privacy (LDP), which is proved to be stronger than CDP in many cases. We develop two algorithms that make pricing decisions and learn the unknown demand on the fly while satisfying the CDP and LDP guarantee, respectively. In particular, for the algorithm with CDP guarantee, the regret is proved to be at most [Formula: see text]. Here, the parameter T denotes the length of the time horizon, d is the dimension of the personalized information vector, and the key parameter [Formula: see text] measures the strength of privacy (smaller ε indicates a stronger privacy protection). Conversely, for the algorithm with LDP guarantee, its regret is proved to be at most [Formula: see text], which is near optimal as we prove a lower more »
- Award ID(s):
- 1845444
- Publication Date:
- NSF-PAR ID:
- 10376128
- Journal Name:
- Operations Research
- ISSN:
- 0030-364X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult thanmore »
-
This paper studies a dynamic pricing problem under model misspecification. To characterize model misspecification, we adopt the ε-contamination model—the most fundamental model in robust statistics and machine learning. In particular, for a selling horizon of length T, the online ε-contamination model assumes that demands are realized according to a typical unknown demand function only for [Formula: see text] periods. For the rest of [Formula: see text] periods, an outlier purchase can happen with arbitrary demand functions. The challenges brought by the presence of outlier customers are mainly due to the fact that arrivals of outliers and their exhibited demand behaviors are completely arbitrary, therefore calling for robust estimation and exploration strategies that can handle any outlier arrival and demand patterns. We first consider unconstrained dynamic pricing without any inventory constraint. In this case, we adopt the Follow-the-Regularized-Leader algorithm to hedge against outlier purchase behavior. Then, we introduce inventory constraints. When the inventory is insufficient, we study a robust bisection-search algorithm to identify the clearance price—that is, the price at which the initial inventory is expected to clear at the end of T periods. Finally, we study the general dynamic pricing case, where a retailer has no clue whether the inventorymore »
-
Motivated by personalized healthcare and other applications involving sensitive data, we study online exploration in reinforcement learning with differential privacy (DP) constraints. Existing work on this problem established that no-regret learning is possible under joint differential privacy (JDP) and local differential privacy (LDP) but did not provide an algorithm with optimal regret. We close this gap for the JDP case by designing an ϵ-JDP algorithm with a regret of O˜(sqrt(SAH^2T) +S^2AH^3/ϵ) which matches the information-theoretic lower bound of non-private learning for all choices of ϵ>S^1.5A^0.5H^2/sqrt(T). In the above, S, A denote the number of states and actions, H denotes the planning horizon, and T is the number of steps. To the best of our knowledge, this is the first private RL algorithm that achieves \emph{privacy for free} asymptotically as T→∞. Our techniques -- which could be of independent interest -- include privately releasing Bernstein-type exploration bonuses and an improved method for releasing visitation statistics. The same techniques also imply a slightly improved regret bound for the LDP case.
-
We consider a stochastic inventory control problem under censored demand, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near optimality of a simple class of policies called “base-stock policies” as well as the convexity of long-run average cost under those policies. We consider a relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound the regret of the algorithm when compared with the best base-stock policy. Our main contribution is a learning algorithm with a regret bound of [Formula: see text] for the inventory control problem. Here, [Formula: see text] is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the expected number of time steps needed to generate enough demand to deplete one unit of inventory. Notably, our regret bounds depend linearly on L, which significantly improves the previously best-known regret bounds for this problem where the dependence on L was exponential. Our techniques utilize the convexity of the long-run average cost and a newly derived bound on the “bias” of base-stock policies to establishmore »