The work studies cooperative decentralized constrained POMDPs with asymmetric information. Using an extension of Sion's Minimax theorem for functions with positive infinity and results on weak-convergence of measures, strong duality and existence of a saddle point are established for the setting of infinite-horizon expected total discounted costs when the observations lie in a countable space, the actions are chosen from a finite space, the immediate constraint costs are bounded, and the immediate objective cost is bounded from below.
more »
« less
Strong Duality Relations in Nonconvex Risk-Constrained Learning
We establish strong duality relations for functional two-step compositional risk-constrained learning problems with multiple nonconvex loss functions and/or learning constraints, regardless of nonconvexity and under a minimal set of technical assumptions. Our results in particular imply zero duality gaps within the class of problems under study, both extending and improving on the state of the art in (risk-neutral) constrained learning. More specifically, we consider risk objectives/constraints which involve real-valued convex and positively homogeneous risk measures admitting dual representations with bounded risk envelopes, generalizing expectations and including popular examples, such as the conditional value-at-risk (CVaR), the meanabsolute deviation (MAD), and more generally all real-valued coherent risk measures on integrable losses as special cases. Our results are based on recent advances in risk-constrained nonconvex programming in infinite dimensions, which rely on a remarkable new application of J. J. Uhl’s convexity theorem, which is an extension of A. A. Lyapunov’s convexity theorem for general, infinite dimensional Banach spaces. By specializing to the risk-neutral setting, we demonstrate, for the first time, that constrained classification and regression can be treated under a unifying lens, while dispensing certain restrictive assumptions enforced in the current literature, yielding a new state-of-the-art strong duality framework for nonconvex constrained learning.
more »
« less
- Award ID(s):
- 2242215
- PAR ID:
- 10527530
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-6929-8
- Page Range / eLocation ID:
- 1 to 6
- Format(s):
- Medium: X
- Location:
- Princeton, NJ, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We present a stochastic descent algorithm for unconstrained optimization that is particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained optimization and machine learning problems. The algorithm maps the gradient onto a low-dimensional ran- dom subspace of dimension at each iteration, similar to coordinate descent but without restricting directional derivatives to be along the axes. Without requiring a full gradient, this mapping can be performed by computing directional deriva- tives (e.g., via forward-mode automatic differentiation). We give proofs for conver- gence in expectation under various convexity assumptions as well as probabilistic convergence results under strong-convexity. Our method provides a novel extension to the well-known Gaussian smoothing technique to descent in subspaces of dimen- sion greater than one, opening the doors to new analysis of Gaussian smoothing when more than one directional derivative is used at each iteration. We also provide a finite-dimensional variant of a special case of the Johnson–Lindenstrauss lemma. Experimentally, we show that our method compares favorably to coordinate descent, Gaussian smoothing, gradient descent and BFGS (when gradients are calculated via forward-mode automatic differentiation) on problems from the machine learning and shape optimization literature.more » « less
-
In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we demonstrate the restrictiveness of these assumptions using three canonical models in machine learning. Then, we develop novel theory to address this shortcoming in two ways. First, we establish that SGD’s iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability.more » « less
-
Melo, S. F.; Fang. F. (Ed.)Existing risk-averse reinforcement learning approaches still face several challenges, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Transition-based vOlatility-controlled Policy Search (TOPS), a novel algorithm that solves risk-averse problems by learning from transitions. We prove that our algorithm—under the over-parameterized neural network regime—finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient. The convergence rate is comparable to the state-of-the-art risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of TOPS’ performance among existing risk-averse policy search methods.more » « less
-
null (Ed.)We study a sequence of many-agent exit time stochastic control problems, parameterized by the number of agents, with risk-sensitive cost structure. We identify a fully characterizing assumption, under which each such control problem corresponds to a risk-neutral stochastic control problem with additive cost, and sequentially to a risk-neutral stochastic control problem on the simplex that retains only the distribution of states of agents, while discarding further specific information about the state of each agent. Under some additional assumptions, we also prove that the sequence of value functions of these stochastic control problems converges to the value function of a deterministic control problem, which can be used for the design of nearly optimal controls for the original problem, when the number of agents is sufficiently large.more » « less