NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Data Dependent Regret Bounds for Online Portfolio Selection with Predicted Returns

Putta, Sudeep R; Agrawal, Shipra (February 2025, 36th International Conference on Algorithmic Learning Theory)

Free, publicly-accessible full text available February 14, 2026
Dynamic Pricing and Learning with Long-term Reference Effects

https://doi.org/10.1145/3670865.3673599

Agrawal, Shipra; Tang, Wei (July 2024, ACM Conference on Economics and Computation (EC))

Full Text Available
Dynamic Pricing and Learning with Long-term Reference Effects

Agrawal, Shipra; Tang, Wei (July 2024, Proceedings of the 25th ACM Conference on Economics and Computation)

Full Text Available
Dynamic pricing and learning with Bayesian persuasion

Agrawal, Shipra; Feng, Yiding; Tang, Wei (December 2023, 37th Conference on Neural Information Processing Systems (NeurIPS 2023).)

Full Text Available
Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management

https://doi.org/10.1287/opre.2022.2263

Agrawal, Shipra; Jia, Randy (May 2022, Operations Research)

We consider a stochastic inventory control problem under censored demand, lost sales, and positive lead times. This is a fundamental problem in inventory management, with significant literature establishing near optimality of a simple class of policies called “base-stock policies” as well as the convexity of long-run average cost under those policies. We consider a relatively less studied problem of designing a learning algorithm for this problem when the underlying demand distribution is unknown. The goal is to bound the regret of the algorithm when compared with the best base-stock policy. Our main contribution is a learning algorithm with a regret bound of [Formula: see text] for the inventory control problem. Here, [Formula: see text] is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the expected number of time steps needed to generate enough demand to deplete one unit of inventory. Notably, our regret bounds depend linearly on L, which significantly improves the previously best-known regret bounds for this problem where the dependence on L was exponential. Our techniques utilize the convexity of the long-run average cost and a newly derived bound on the “bias” of base-stock policies to establish an almost black box connection between the problem of learning in Markov decision processes (MDPs) with these properties and the stochastic convex bandit problem. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex asymptotic average cost functions.
more » « less
Full Text Available
Optimistic Posterior Sampling for Reinforcement Learning: Worst-Case Regret Bounds

https://doi.org/10.1287/moor.2022.1266

Agrawal, Shipra; Jia, Randy (May 2022, Mathematics of Operations Research)

We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves near-optimal worst-case regret bounds when the underlying Markov decision process (MDP) is communicating with a finite, although unknown, diameter. Our main result is a high probability regret upper bound of [Formula: see text] for any communicating MDP with S states, A actions, and diameter D. Here, regret compares the total reward achieved by the algorithm to the total expected reward of an optimal infinite-horizon undiscounted average reward policy in time horizon T. This result closely matches the known lower bound of [Formula: see text]. Our techniques involve proving some novel results about the anti-concentration of Dirichlet distribution, which may be of independent interest.
more » « less
Full Text Available
Scale-Free Adversarial Multi Armed Bandits

Sudeep Raja Putta, Shipra Agrawal (April 2022, International Conference on Algorithmic Learning Theory)

Full Text Available
Online Allocation and Learning in the Presence of Strategic Agents

Yin, Steven; Agrawal, Shipra; Zeevi, Assaf (January 2022, Advances in neural information processing systems)

Full Text Available
Dynamic Pricing and Learning under the Bass Model

https://doi.org/10.1145/3465456.3467546

Agrawal, Shipra; Yin, Steven; Zeevi, Assaf (July 2021, EC '21: Proceedings of the 22nd ACM Conference on Economics and Computation)
null (Ed.)
Full Text Available
Robust Repeated First Price Auctions

https://doi.org/10.1145/3465456.3467590

Agrawal, Shipra; Balkanski, Eric; Mirrokni, Vahab; Sivan, Balasubramanian (July 2021, EC '21: Proceedings of the 22nd ACM Conference on Economics and Computation)
null (Ed.)
Full Text Available

Search for: All records