skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.


Title: Convex Q Learning in a Stochastic Environment
The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, a significant connection between the solution to the new convex program, and the solution to standard Q-learning with linear function approximation. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.  more » « less
Award ID(s):
1935389
NSF-PAR ID:
10483731
Author(s) / Creator(s):
Publisher / Repository:
IEEE
Date Published:
Journal Name:
Proceedings of the IEEE Conference on Decision Control
ISSN:
0743-1546
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sample complexity bounds are a common performance metric in the Reinforcement Learning literature. In the discounted cost, infinite horizon setting, all of the known bounds can be arbitrarily large, as the discount factor approaches unity. These results seem to imply that a very large number of samples is required to achieve an epsilon-optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded over all discount factors. One may argue that this is impossible, due to a recent min-max lower bound. The explanation is that these prior bounds concern value function approximation and not policy approximation. We show that the asymptotic covariance of the tabular Q-learning algorithm with an optimized step-size sequence is a quadratic function of a factor that goes to infinity, as discount factor approaches 1; an essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic covariance that is uniformly bounded over all discount factors. 
    more » « less
  2. Abstract

    This paper studies several solution paths of sparse quadratic minimization problems as a function of the weighing parameter of the bi-objective of estimation loss versus solution sparsity. Three such paths are considered: the “$$\ell _0$$0-path” where the discontinuous$$\ell _0$$0-function provides the exact sparsity count; the “$$\ell _1$$1-path” where the$$\ell _1$$1-function provides a convex surrogate of sparsity count; and the “capped$$\ell _1$$1-path” where the nonconvex nondifferentiable capped$$\ell _1$$1-function aims to enhance the$$\ell _1$$1-approximation. Serving different purposes, each of these three formulations is different from each other, both analytically and computationally. Our results deepen the understanding of (old and new) properties of the associated paths, highlight the pros, cons, and tradeoffs of these sparse optimization models, and provide numerical evidence to support the practical superiority of the capped$$\ell _1$$1-path. Our study of the capped$$\ell _1$$1-path is interesting in its own right as the path pertains to computable directionally stationary (= strongly locally minimizing in this context, as opposed to globally optimal) solutions of a parametric nonconvex nondifferentiable optimization problem. Motivated by classical parametric quadratic programming theory and reinforced by modern statistical learning studies, both casting an exponential perspective in fully describing such solution paths, we also aim to address the question of whether some of them can be fully traced in strongly polynomial time in the problem dimensions. A major conclusion of this paper is that a path of directional stationary solutions of the capped$$\ell _1$$1-regularized problem offers interesting theoretical properties and practical compromise between the$$\ell _0$$0-path and the$$\ell _1$$1-path. Indeed, while the$$\ell _0$$0-path is computationally prohibitive and greatly handicapped by the repeated solution of mixed-integer nonlinear programs, the quality of$$\ell _1$$1-path, in terms of the two criteria—loss and sparsity—in the estimation objective, is inferior to the capped$$\ell _1$$1-path; the latter can be obtained efficiently by a combination of a parametric pivoting-like scheme supplemented by an algorithm that takes advantage of the Z-matrix structure of the loss function.

     
    more » « less
  3. null (Ed.)
    We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an origin-symmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1n∑j=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like max-cut, Grothendieck/non-commutative Grothendieck inequalities, small set expansion and more. While the literature studied these special cases using case-specific reasoning, here we develop a general methodology for treatment of the approximability and inapproximability aspects of these questions. The underlying geometry of K plays a critical role; we show under commonly used complexity assumptions that polytime constant-approximability necessitates that K has type-2 constant that grows slowly with n. However, we show that even when the type-2 constant is bounded, this problem sometimes exhibits strong hardness of approximation. Thus, even within the realm of type-2 bodies, the approximability landscape is nuanced and subtle. However, the link that we establish between optimization and geometry of Banach spaces allows us to devise a generic algorithmic approach to the above problem. We associate to each convex body a new (higher dimensional) auxiliary set that is not convex, but is approximately convex when K has a bounded type-2 constant. If our auxiliary set has an approximate separation oracle, then we design an approximation algorithm for the original quadratic optimization problem, using an approximate version of the ellipsoid method. Even though our hardness result implies that such an oracle does not exist in general, this new question can be solved in specific cases of interest by implementing a range of classical tools from functional analysis, most notably the deep factorization theory of linear operators. Beyond encompassing the scenarios in the literature for which constant-factor approximation algorithms were found, our generic framework implies that that for convex sets with bounded type-2 constant, constant factor approximability is preserved under the following basic operations: (a) Subspaces, (b) Quotients, (c) Minkowski Sums, (d) Complex Interpolation. This yields a rich family of new examples where constant factor approximations are possible, which were beyond the reach of previous methods. We also show (under commonly used complexity assumptions) that for symmetric norms and unitarily invariant matrix norms the type-2 constant nearly characterizes the approximability of quadratic maximization. 
    more » « less
  4. The design of cyber-physical systems (CPSs) requires methods and tools that can efficiently reason about the interaction between discrete models, e.g., representing the behaviors of ``cyber'' components, and continuous models of physical processes. Boolean methods such as satisfiability (SAT) solving are successful in tackling large combinatorial search problems for the design and verification of hardware and software components. On the other hand, problems in control, communications, signal processing, and machine learning often rely on convex programming as a powerful solution engine. However, despite their strengths, neither approach would work in isolation for CPSs. In this paper, we present a new satisfiability modulo convex programming (SMC) framework that integrates SAT solving and convex optimization to efficiently reason about Boolean and convex constraints at the same time. We exploit the properties of a class of logic formulas over Boolean and nonlinear real predicates, termed monotone satisfiability modulo convex formulas, whose satisfiability can be checked via a finite number of convex programs. Following the lazy satisfiability modulo theory (SMT) paradigm, we develop a new decision procedure for monotone SMC formulas, which coordinates SAT solving and convex programming to provide a satisfying assignment or determine that the formula is unsatisfiable. A key step in our coordination scheme is the efficient generation of succinct infeasibility proofs for inconsistent constraints that can support conflict-driven learning and accelerate the search. We demonstrate our approach on different CPS design problems, including spacecraft docking mission control, robotic motion planning, and secure state estimation. We show that SMC can handle more complex problem instances than state-of-the-art alternative techniques based on SMT solving and mixed integer convex programming. 
    more » « less
  5. Berry, Jonathan ; Shmoys, David ; Cowen, Lenore ; Naumann, Uwe (Ed.)
    Continuous DR-submodular functions are a class of functions that satisfy the Diminishing Returns (DR) property, which implies that they are concave along non-negative directions. Existing works have studied monotone continuous DR-submodular maximization subject to a convex constraint and have proposed efficient algorithms with approximation guarantees. However, in many applications, e. g., computing the stability number of a graph and mean-field inference for probabilistic log-submodular models, the DR-submodular function has the additional property of being strongly concave along non-negative directions that could be utilized for obtaining faster convergence rates. In this paper, we first introduce and characterize the class of strongly DR-submodular functions and show how such a property implies strong concavity along non-negative directions. Then, we study L-smooth monotone strongly DR-submodular functions that have bounded curvature, and we show how to exploit such additional structure to obtain algorithms with improved approximation guarantees and faster convergence rates for the maximization problem. In particular, we propose the SDRFW algorithm that matches the provably optimal approximation ratio after only iterations, where c ∈ [0,1] and μ ≥ 0 are the curvature and the strong DR-submodularity parameter. Furthermore, we study the Projected Gradient Ascent (PGA) method for this problem and provide a refined analysis of the algorithm with an improved approximation ratio (compared to ½ in prior works) and a linear convergence rate. Given that both algorithms require knowledge of the smoothness parameter L, we provide a novel characterization of L for DR-submodular functions showing that in many cases, computing L could be formulated as a convex optimization problem, i. e., a geometric program, that could be solved efficiently. Experimental results illustrate and validate the efficiency and effectiveness of our algorithms. 
    more » « less