NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dynamic Car Dispatching and Pricing: Revenue and Fairness for Ridesharing Platforms

Zishuo Zhao; Xi Chen; Xuefeng Zhang; Yuan Zhou (July 2022, International Joint Conference on Artificial Intelligence)

Full Text Available
LEARNING LONG-TERM REWARD REDISTRIBUTION VIA RANDOMIZED RETURN DECOMPOSITION

Zhizhou Ren; Ruihan Gao; Yuan Zhou; Jian Peng (April 2022, International Conference on Learning Representations)

Full Text Available
Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information

https://doi.org/10.1287/mnsc.2021.4171

Chen, Boxiao; Simchi-Levi, David; Wang, Yining; Zhou, Yuan (December 2021, Management Science)

We consider the periodic review dynamic pricing and inventory control problem with fixed ordering cost. Demand is random and price dependent, and unsatisfied demand is backlogged. With complete demand information, the celebrated [Formula: see text] policy is proved to be optimal, where s and S are the reorder point and order-up-to level for ordering strategy, and [Formula: see text], a function of on-hand inventory level, characterizes the pricing strategy. In this paper, we consider incomplete demand information and develop online learning algorithms whose average profit approaches that of the optimal [Formula: see text] with a tight [Formula: see text] regret rate. A number of salient features differentiate our work from the existing online learning researches in the operations management (OM) literature. First, computing the optimal [Formula: see text] policy requires solving a dynamic programming (DP) over multiple periods involving unknown quantities, which is different from the majority of learning problems in OM that only require solving single-period optimization questions. It is hence challenging to establish stability results through DP recursions, which we accomplish by proving uniform convergence of the profit-to-go function. The necessity of analyzing action-dependent state transition over multiple periods resembles the reinforcement learning question, considerably more difficult than existing bandit learning algorithms. Second, the pricing function [Formula: see text] is of infinite dimension, and approaching it is much more challenging than approaching a finite number of parameters as seen in existing researches. The demand-price relationship is estimated based on upper confidence bound, but the confidence interval cannot be explicitly calculated due to the complexity of the DP recursion. Finally, because of the multiperiod nature of [Formula: see text] policies the actual distribution of the randomness in demand plays an important role in determining the optimal pricing strategy [Formula: see text], which is unknown to the learner a priori. In this paper, the demand randomness is approximated by an empirical distribution constructed using dependent samples, and a novel Wasserstein metric-based argument is employed to prove convergence of the empirical distribution. This paper was accepted by J. George Shanthikumar, big data analytics.
more » « less
Full Text Available
Linear bandits with limited adaptivity and learning distributional optimal design

https://doi.org/10.1145/3406325.3451004

Ruan, Yufei; Yang, Jiaqi; Zhou, Yuan (June 2021, STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing)
null (Ed.)
Full Text Available
Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Li, Yingkai; Wang, Yining; Chen, Xi; Zhou, Yuan (January 2021, Proceedings of The 24th International Conference on Artificial Intelligence and Statistics)
null (Ed.)
Full Text Available
Dynamic Assortment Planning Under Nested Logit Models

https://doi.org/10.1111/POMS.13258

Chen, Xi; Shi, Chao; Wang, Yining; Zhou, Yuan (January 2021, Production and Operations Management)
null (Ed.)
Full Text Available
Near-Optimal MNL Bandits Under Risk Criteria

Xi, Guangyu; Tao, Chao; Zhou, Yuan (January 2021, The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21))
null (Ed.)
Full Text Available
Collaborative Top Distribution Identifications with Limited Interaction (Extended Abstract)

https://doi.org/10.1109/FOCS46700.2020.00024

Karpov, Nikolai; Zhang, Qin; Zhou, Yuan (November 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS))
null (Ed.)
Full Text Available
Learning Guidance Rewards with Trajectory-space Smoothing

Gangwani, Tanmay; Zhou, Yuan; Peng, Jian (January 2020, Advances in Neural Information Processing Systems 33 (NeurIPS 2020))
null (Ed.)
Full Text Available
Dynamic Assortment Optimization with Changing Contextual Information

Chen, Xi; Wang, Yining; Zhou, Yuan (January 2020, Journal of machine learning research)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records