skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Zap Q-Learning - A User's Guide
The authors develop a theory characterizing optimal stopping times for discrete-time ergodic Markov processes with discounted rewards. The theory differs from prior work by its view of per-stage and terminal reward functions as elements of a certain Hilbert space. In addition to a streamlined analysis establishing existence and uniqueness of a solution to Bellman's equation, this approach provides an elegant framework for the study of approximate solutions. In particular, the authors propose a stochastic approximation algorithm that tunes weights of a linear combination of basis functions in order to approximate a value function. They prove that this algorithm converges (almost surely) and that the limit of convergence has some desirable properties. The utility of the approximation method is illustrated via a computational case study involving the pricing of a path dependent financial derivative security that gives rise to an optimal stopping problem with a 100-dimensional state space  more » « less
Award ID(s):
1646229
PAR ID:
10211835
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proc. of the Fifth Indian Control Conference
Page Range / eLocation ID:
10 to 15
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We provide an approximation algorithm for network revenue management problems. In our approximation algorithm, we construct an approximate policy using value function approximations that are expressed as linear combinations of basis functions. We use a backward recursion to compute the coefficients of the basis functions in the linear combinations. If each product uses at most L resources, then the total expected revenue obtained by our approximate policy is at least [Formula: see text] of the optimal total expected revenue. In many network revenue management settings, although the number of resources and products can become large, the number of resources used by a product remains bounded. In this case, our approximate policy provides a constant-factor performance guarantee. Our approximate policy can handle nonstationarities in the customer arrival process. To our knowledge, our approximate policy is the first approximation algorithm for network revenue management problems under nonstationary arrivals. Our approach can incorporate the customer choice behavior among the products, and allows the products to use multiple units of a resource, while still maintaining the performance guarantee. In our computational experiments, we demonstrate that our approximate policy performs quite well, providing total expected revenues that are substantially better than its theoretical performance guarantee. 
    more » « less
  2. New families of direct serendipity and direct mixed finite elements on general planar, strictly convex polygons were recently defined by the authors. The finite elements of index r are H1 and H(div) conforming, respectively, and approximate optimally to order r+1 while using the minimal number of degrees of freedom. The shape function space consists of the full set of polynomials defined directly on the element and augmented with a space of supplemental functions. The supplemental functions were constructed as rational functions, which can be difficult to integrate accurately using numerical quadrature rules when the index is high. This can result in a loss of accuracy in certain cases. In this work, we propose alternative ways to construct the supplemental functions on the element as continuous piecewise polynomials. One approach results in supplemental functions that are in Hp for any p≥1. We prove the optimal approximation property for these new finite elements. We also perform numerical tests on them, comparing results for the original supplemental functions and the various alternatives. The new piecewise polynomial supplements can be integrated accurately, and therefore show better robustness with respect to the underlying meshes used. 
    more » « less
  3. Kumar, Amit; Ron-Zewi, Noga (Ed.)
    We initiate the study of centralized algorithms for welfare-maximizing allocation of goods to buyers subject to average-value constraints. We show that this problem is NP-hard to approximate beyond a factor of e/(e-1), and provide a 4e/(e-1)-approximate offline algorithm. For the online setting, we show that no non-trivial approximations are achievable under adversarial arrivals. Under i.i.d. arrivals, we present a polytime online algorithm that provides a constant approximation of the optimal (computationally-unbounded) online algorithm. In contrast, we show that no constant approximation of the ex-post optimum is achievable by an online algorithm. 
    more » « less
  4. null (Ed.)
    The primary objective of this paper is to develop computationally efficient methods for optimal stopping of an adaptive Phase II dose-finding clinical trial, where the decision maker may terminate the trial for efficacy or abandon it as a result of futility. We develop two solution methods and compare them in terms of computational time and several performance metrics such as the probability of correct stopping decision. One proposed method is an application of the one-step look-ahead policy to this problem. The second proposal builds a diffusion approximation to the state variable in the continuous regime and approximates the trial’s stopping time by optimal stopping of a diffusion process. The secondary objective of the paper is to compare these methods on different dose-response curves, particularly when the true dose-response curve has no significant advantage over a placebo. Our results, which include a real clinical trial case study, show that look-ahead policies perform poorly in terms of the probability of correct decision in this setting, whereas our diffusion approximation method provides robust solutions. 
    more » « less
  5. We study optimal design problems in which the goal is to choose a set of linear measurements to obtain the most accurate estimate of an unknown vector. We study the [Formula: see text]-optimal design variant where the objective is to minimize the average variance of the error in the maximum likelihood estimate of the vector being measured. We introduce the proportional volume sampling algorithm to obtain nearly optimal bounds in the asymptotic regime when the number [Formula: see text] of measurements made is significantly larger than the dimension [Formula: see text] and obtain the first approximation algorithms whose approximation factor does not degrade with the number of possible measurements when [Formula: see text] is small. The algorithm also gives approximation guarantees for other optimal design objectives such as [Formula: see text]-optimality and the generalized ratio objective, matching or improving the previously best-known results. We further show that bounds similar to ours cannot be obtained for [Formula: see text]-optimal design and that [Formula: see text]-optimal design is NP-hard to approximate within a fixed constant when [Formula: see text]. 
    more » « less