 Home
 Search Results
 Page 1 of 1
Search for: All records

Total Resources1
 Resource Type

00010
 Availability

10
 Author / Contributor
 Filter by Author / Creator


Chen, Boxiao (1)

SimchiLevi, David (1)

Wang, Yining (1)

Zhou, Yuan (1)

#Tyler Phillips, Kenneth E. (0)

#Willis, Ciara (0)

& AbreuRamos, E. D. (0)

& Abramson, C. I. (0)

& AbreuRamos, E. D. (0)

& Adams, S.G. (0)

& Ahmed, K. (0)

& Ahmed, Khadija. (0)

& Aina, D.K. Jr. (0)

& AkcilOkan, O. (0)

& Akuom, D. (0)

& Aleven, V. (0)

& AndrewsLarson, C. (0)

& Archibald, J. (0)

& Arnett, N. (0)

& Arya, G. (0)

 Filter by Editor


& Spizer, S. M. (0)

& . Spizer, S. (0)

& Ahn, J. (0)

& Bateiha, S. (0)

& Bosch, N. (0)

& Brennan K. (0)

& Brennan, K. (0)

& Chen, B. (0)

& Chen, Bodong (0)

& Drown, S. (0)

& Ferretti, F. (0)

& Higgins, A. (0)

& J. Peters (0)

& Kali, Y. (0)

& RuizArias, P.M. (0)

& S. Spitzer (0)

& Sahin. I. (0)

& Spitzer, S. (0)

& Spitzer, S.M. (0)

(submitted  in Review for IEEE ICASSP2024) (0)


Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and orderupto level for ordering strategy, and [Formula: see text], a function of onhand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving singleperiod optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profittogo function. The necessity of analyzing actiondependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demandprice relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metricbased argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.more » « less