skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Strong Duality Relations in Nonconvex Risk-Constrained Learning
We establish strong duality relations for functional two-step compositional risk-constrained learning problems with multiple nonconvex loss functions and/or learning constraints, regardless of nonconvexity and under a minimal set of technical assumptions. Our results in particular imply zero duality gaps within the class of problems under study, both extending and improving on the state of the art in (risk-neutral) constrained learning. More specifically, we consider risk objectives/constraints which involve real-valued convex and positively homogeneous risk measures admitting dual representations with bounded risk envelopes, generalizing expectations and including popular examples, such as the conditional value-at-risk (CVaR), the meanabsolute deviation (MAD), and more generally all real-valued coherent risk measures on integrable losses as special cases. Our results are based on recent advances in risk-constrained nonconvex programming in infinite dimensions, which rely on a remarkable new application of J. J. Uhl’s convexity theorem, which is an extension of A. A. Lyapunov’s convexity theorem for general, infinite dimensional Banach spaces. By specializing to the risk-neutral setting, we demonstrate, for the first time, that constrained classification and regression can be treated under a unifying lens, while dispensing certain restrictive assumptions enforced in the current literature, yielding a new state-of-the-art strong duality framework for nonconvex constrained learning.  more » « less
Award ID(s):
2242215
PAR ID:
10527530
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-6929-8
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Location:
Princeton, NJ, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. The work studies cooperative decentralized constrained POMDPs with asymmetric information. Using an extension of Sion's Minimax theorem for functions with positive infinity and results on weak-convergence of measures, strong duality and existence of a saddle point are established for the setting of infinite-horizon expected total discounted costs when the observations lie in a countable space, the actions are chosen from a finite space, the immediate constraint costs are bounded, and the immediate objective cost is bounded from below. 
    more » « less
  2. This paper addresses the study of a new class of nonsmooth optimization prob lems, where the objective is represented as a difference of two generally nonconvex functions. We propose and develop a novel Newton-type algorithm to solving such problems, which is based on the coderivative generated second-order subdifferential (generalized Hessian) and employs advanced tools of variational analysis. Well posedness properties of the proposed algorithm are derived under fairly general requirements, while constructive convergence rates are established by using additional assumptions including the Kurdyka–Łojasiewicz condition. We provide applications of the main algorithm to solving a general class of nonsmooth nonconvex problems of structured optimization that encompasses, in particular, optimization problems with explicit constraints. Finally, applications and numerical experiments are given for solving practical problems that arise in biochemical models, supervised learning, constrained quadratic programming, etc., where advantages of our algorithms are demonstrated in comparison with some known techniques and results. 
    more » « less
  3. Abstract Many inverse problems are naturally formulated as a PDE-constrained optimization problem. These non-linear, large-scale, constrained optimization problems know many challenges, of which the inherent non-linearity of the problem is an important one. In this paper, we focus on a relaxed formulation of the PDE-constrained optimization problem and provide analysis for its properties including convexity under certain assumptions. Starting from an infinite-dimensional formulation of the inverse problem with discrete data, we propose a general framework for the analysis and discretisation of such problems. The relaxed formulation of the PDE-constrained optimization problem is shown to reduce to a weighted non-linear least-squares problem. The weight matrix turns out to be the Gram matrix of solutions of the PDE, and in some cases be estimated directly from the measurements. The latter observation points to a potential way to unify recently proposed data-driven reduced-order models for inverse problems with PDE-constrained optimization. We provide a number of representative case studies and numerical examples to illustrate our findings. 
    more » « less
  4. null (Ed.)
    We present a stochastic descent algorithm for unconstrained optimization that is particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained optimization and machine learning problems. The algorithm maps the gradient onto a low-dimensional ran- dom subspace of dimension at each iteration, similar to coordinate descent but without restricting directional derivatives to be along the axes. Without requiring a full gradient, this mapping can be performed by computing directional deriva- tives (e.g., via forward-mode automatic differentiation). We give proofs for conver- gence in expectation under various convexity assumptions as well as probabilistic convergence results under strong-convexity. Our method provides a novel extension to the well-known Gaussian smoothing technique to descent in subspaces of dimen- sion greater than one, opening the doors to new analysis of Gaussian smoothing when more than one directional derivative is used at each iteration. We also provide a finite-dimensional variant of a special case of the Johnson–Lindenstrauss lemma. Experimentally, we show that our method compares favorably to coordinate descent, Gaussian smoothing, gradient descent and BFGS (when gradients are calculated via forward-mode automatic differentiation) on problems from the machine learning and shape optimization literature. 
    more » « less
  5. In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we demonstrate the restrictiveness of these assumptions using three canonical models in machine learning. Then, we develop novel theory to address this shortcoming in two ways. First, we establish that SGD’s iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability. 
    more » « less