— In large-scale and model-free settings, first-order algorithms are often used in an attempt to find the optimal control action without identifying the underlying dynamics. The convergence properties of these algorithms remain poorly understood because of nonconvexity. In this paper, we revisit the continuous-time linear quadratic regulator problem and take a step towards demystifying the efficiency of gradient-based strategies. Despite the lack of convexity, we establish a linear rate of convergence to the globally optimal solution for the gradient descent algorithm. The key component of our analysis is that we relate the gradient-flow dynamics associated with the nonconvex formulation to that of a convex reparameterization. This allows us to provide convergence guarantees for the nonconvex approach from its convex counterpart.
more »
« less
Global exponential stability of primal-dual gradient flow dynamics based on the proximal augmented Lagrangian
The primal-dual gradient flow dynamics based on the proximal augmented Lagrangian were introduced in [1] to solve nonsmooth composite optimization problems with a linear equality constraint. We use a Lyapunov-based approach to demonstrate global exponential stability of the underlying dynamics when the differentiable part of the objective function is strongly convex and its gradient is Lipschitz continuous. This also allows us to determine a bound on the stepsize that guarantees linear convergence of the discretized algorithm.
more »
« less
- Award ID(s):
- 1809833
- PAR ID:
- 10128664
- Date Published:
- Journal Name:
- 2019 American Control Conference (ACC)
- Page Range / eLocation ID:
- 3414 to 3419
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Polymers and other glass-forming liquids can exhibit profound alterations in dynamics in the nanoscale vicinity of interfaces, over a range appreciably exceeding that of typical interfacial thermodynamic gradients. The understanding of these dynamical gradients is particularly complicated in systems with internal or external nanoscale dimensions, where a gradient nucleated at one interface can impinge on a second, potentially distinct, interface. To better understand the interactions that govern system dynamics and glass formation in these cases, here we simulate the baseline case of a glass-forming polymer film, over a wide range of thickness, supported on a dynamically neutral substrate that has little effect on nearby dynamics. We compare these results to our prior simulations of freestanding films. Results indicate that dynamical gradients in our simulated systems, as measured based upon translational relaxation, are simply truncated when they impinge on a secondary surface that is locally dynamically neutral. Altered film behavior can be described almost entirely by gradient effects down to the thinnest films probed, with no evidence for finite-size effects sometimes posited to play a role in these systems. Finally, our simulations predict that linear gradient overlap effects in the presence of symmetric dynamically active interfaces yield a non-monotonic variation of the whole free standing film stretching exponent (relaxation time distribution breadth). The maximum relaxation time distribution breadth in simulation is found at a film thickness of 4–5 times the interfacial gradient range. Observation of this maximum in experiment would provide an important validation that the gradient behavior observed in simulation persists to experimental timescales. If validated, observation of this maximum would potentially also enable determination of the dynamic gradient range from experimental mean-film measurements of film dynamics.more » « less
-
Understanding the training dynamics of transformers is important to explain the impressive capabilities behind large language models. In this work, we study the dynamics of training a shallow transformer on a task of recognizing co-occurrence of two designated words. In the literature of studying training dynamics of transformers, several simplifications are commonly adopted such as weight reparameterization, attention linearization, special initialization, and lazy regime. In contrast, we analyze the gradient flow dynamics of simultaneously training three attention matrices and a linear MLP layer from random initialization, and provide a framework of analyzing such dynamics via a coupled dynamical system. We establish near minimum loss and characterize the attention model after training. We discover that gradient flow serves as an inherent mechanism that naturally divide the training process into two phases. In Phase 1, the linear MLP quickly aligns with the two target signals for correct classification, whereas the softmax attention remains almost unchanged. In Phase 2, the attention matrices and the MLP evolve jointly to enlarge the classification margin and reduce the loss to a near minimum value. Technically, we prove a novel property of the gradient flow, termed \textit{automatic balancing of gradients}, which enables the loss values of different samples to decrease almost at the same rate and further facilitates the proof of near minimum training loss. We also conduct experiments to verify our theoretical results.more » « less
-
Jaggi, Martin (Ed.)A classical approach for solving discrete time nonlinear control on a nite horizon consists in repeatedly minimizing linear quadratic approximations of the original problem around current candidate solutions. While widely popular in many domains, such an approach has mainly been analyzed locally. We provide detailed convergence guarantees to stationary points as well as local linear convergence rates for the Iterative Linear Quadratic Regulator (ILQR) algorithm and its Di erential Dynamic Programming (DDP) variant. For problems without costs on control variables, we observe that global convergence to minima can be ensured provided that the linearized discrete time dynamics are surjective, costs on the state variables are gradient dominated. We further detail quadratic local convergence when the costs are self-concordant. We show that surjectivity of the linearized dynamics hold for appropriate discretization schemes given the existence of a feedback linearization scheme. We present complexity bounds of algorithms based on linear quadratic approximations through the lens of generalized Gauss-Newton methods. Our analysis uncovers several convergence phases for regularized generalized Gauss-Newton algorithms.more » « less
-
Firoozi, Roya; Mehr, Negar; Yel, Esen; Antonova, Rika; Bohg, Jeannette; Schwager, Mac; Kochenderfer, Mykel (Ed.)This paper investigates the learning, or system identification, of a class of piecewise-affine dynamical systems known as linear complementarity systems (LCSs). We propose a violation-based loss which enables efficient learning of the LCS parameterization, without prior knowledge of the hybrid mode boundaries, using gradient-based methods. The proposed violation-based loss incorporates both dynamics prediction loss and a novel complementarity - violation loss. We show several properties attained by this loss formulation, including its differentiability, the efficient computation of first- and second-order derivatives, and its relationship to the traditional prediction loss, which strictly enforces complementarity. We apply this violation-based loss formulation to learn LCSs with tens of thousands of (potentially stiff) hybrid modes. The results demonstrate a state-of-the-art ability to identify piecewise-affine dynamics, outperforming methods which must differentiate through non-smooth linear complementarity problems.more » « less
An official website of the United States government

