Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract We propose novel methods for adaptive series forecasting and prediction-interval construction, illustrated with COVID-19 case and death counts. Our framework applies an automated transformation to reduce heteroscedasticity, then imposes a constrained smoothing near the forecast edge via robust quadratic regression, emphasizing recent data. A Long Short-Term Memory (LSTM) model combined with ARIMA-based noise correction further refines the forecast. Compared to conventional methods (e.g., ARIMA alone, unprocessed deep learning), this adaptive approach achieves superior metrics and reliable bootstrap-derived confidence and prediction intervals. We also highlight how reinforcement learning (RL) can offer promising avenues for real-time decision-making and further improvements in forecasting adaptability.more » « less
-
Abstract This paper introduces the first asymptotically optimal strategy for a multi armed bandit (MAB) model under side constraints. The side constraints model situations in which bandit activations are limited by the availability of certain resources that are replenished at a constant rate. The main result involves the derivation of an asymptotic lower bound for the regret of feasible uniformly fast policies and the construction of policies that achieve this lower bound, under pertinent conditions. Further, we provide the explicit form of such policies for the case in which the unknown distributions are Normal with unknown means and known variances, for the case of Normal distributions with unknown means and unknown variances and for the case of arbitrary discrete distributions with finite support.more » « less
-
Abstract We investigate a data‐driven dynamic inventory control problem involving fixed setup costs and lost sales. Random demand arrivals stem from a demand distribution that is only known to come out of a vast ambiguity set. Lost sales and demand ambiguity would together complicate the problem through censoring, namely, the inability of the firm to observe the lost portion of the demand data. Our main policy idea advocates periodically ordering up to high levels for learning purposes and, in intervening periods, cleverly exploiting the information gained in learning periods. By regret, we mean the price paid for ambiguity in long‐run average performances. When demand has a finite support, we can accomplish a regret bound in the order of which almost matches a known lower bound as long as inventory costs are genuinely convex. Major policy adjustments are warranted for the more complex case involving an unbounded demand support. The resulting regret could range between and depending on the nature of moment‐related bounds that help characterize the degree of ambiguity. These are improvable to when distributions are light‐tailed. Our simulation demonstrates the merits of various policy ideas.more » « less
-
Abstract We study new types of dynamic allocation problems theHalting Banditmodels. As an application, we obtain new proofs for the classic Gittins index decomposition result compare Gittins (Journal of the Royal Statistical Society, Series B, 1979, 41, 148–177), and recent results of the authors in Cowan and Katehakis (Probability in the Engineering and Informational Sciences, 2015, 29, 51–76).more » « less
-
When a new product has just been introduced or the economy has just entered a new phase, a firm is often at a loss as to what the underlying demand pattern has become let alone how best to respond to it. In “Dynamic Inventory Control with Fixed Setup Costs and Unknown Discrete Demand Distribution,” Davoodi, Katehakis, and Yang faced off this challenging problem by tailoring ordering decisions to empirical distributions formed out of past demand observations. In the presence of fixed setup costs, however, an (s,S) policy optimal in the conventional known-distribution setting would take many periods for its long-term benefit to be realized. Therefore, a good online policy has to balance between letting ordering decisions settle for long periods and adjusting them frequently to take advantage of newly available information. When properly balanced, such policies could indeed achieve tight bounds for the performance measure of regret.more » « less
-
The purpose of this paper is to provide further understanding into the structure of the sequential allocation (“stochastic multi-armed bandit”) problem by establishing probability one finite horizon bounds and convergence rates for the sample regret associated with two simple classes of allocation policies. For any slowly increasing functiong, subject to mild regularity constraints, we construct two policies (theg-Forcing, and theg-Inflated Sample Mean) that achieve a measure of regret of orderO(g(n)) almost surely asn→ ∞, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the functiongeffectively controls the “exploration” of the classical “exploration/exploitation” tradeoff.more » « less
An official website of the United States government
