skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.


Title: Variational Optimization on Lie Groups, with Examples of Leading (Generalized) Eigenvalue Problems
The article considers smooth optimization of functions on Lie groups. By generalizing NAG variational principle in vector space (Wibisono et al., 2016) to general Lie groups, continuous Lie-NAG dynamics which are guaranteed to converge to local optimum are obtained. They correspond to momentum versions of gradient flow on Lie groups. A particular case of SO(đť‘›) is then studied in details, with objective functions corresponding to leading Generalized EigenValue problems: the Lie-NAG dynamics are first made explicit in coordinates, and then discretized in structure preserving fashions, resulting in optimization algorithms with faithful energy behavior (due to conformal symplecticity) and exactly remaining on the Lie group. Stochastic gradient versions are also investigated. Numerical experiments on both synthetic data and practical problem (LDA for MNIST) demonstrate the effectiveness of the proposed methods as optimization algorithms (\emph{not} as a classification method).  more » « less
Award ID(s):
1824798
NSF-PAR ID:
10279890
Author(s) / Creator(s):
;
Editor(s):
Chiappa, Silvia; Calandra, Roberto
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
108
ISSN:
2640-3498
Page Range / eLocation ID:
4269-4280
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms—Nesterov’s accelerated gradient method for strongly convex functions (NAG-) and Polyak’s heavy-ball method—we study an alternative limiting process that yieldshigh-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG- and Polyak’s heavy-ball method, but they allow the identification of a term that we refer to as “gradient correction” that is present in NAG- but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov’s accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result—that NAG- minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG- for smooth convex functions.

     
    more » « less
  2. null (Ed.)
    Gradient descent-based optimization methods underpin the parameter training of neural networks, and hence comprise a significant component in the impressive test results found in a number of applications. Introducing stochasticity is key to their success in practical problems, and there is some understanding of the role of stochastic gradient descent in this context. Momentum modifications of gradient descent such as Polyak’s Heavy Ball method (HB) and Nesterov’s method of accelerated gradients (NAG), are also widely adopted. In this work our focus is on understanding the role of momentum in the training of neural networks, concentrating on the common situation in which the momentum contribution is fixed at each step of the algorithm. To expose the ideas simply we work in the deterministic setting. Our approach is to derive continuous time approximations of the discrete algorithms; these continuous time approximations provide insights into the mechanisms at play within the discrete algorithms. We prove three such approximations. Firstly we show that standard implementations of fixed momentum methods approximate a time-rescaled gradient descent flow, asymptotically as the learning rate shrinks to zero; this result does not distinguish momentum methods from pure gradient descent, in the limit of vanishing learning rate. We then proceed to prove two results aimed at understanding the observed practical advantages of fixed momentum methods over gradient descent, when implemented in the non-asymptotic regime with fixed small, but non-zero, learning rate. We achieve this by proving approximations to continuous time limits in which the small but fixed learning rate appears as a parameter; this is known as the method of modified equations in the numerical analysis literature, recently rediscovered as the high resolution ODE approximation in the machine learning context. In our second result we show that the momentum method is approximated by a continuous time gradient flow, with an additional momentum-dependent second order time-derivative correction, proportional to the learning rate; this may be used to explain the stabilizing effect of momentum algorithms in their transient phase. Furthermore in a third result we show that the momentum methods admit an exponentially attractive invariant manifold on which the dynamics reduces, approximately, to a gradient flow with respect to a modified loss function, equal to the original loss function plus a small perturbation proportional to the learning rate; this small correction provides convexification of the loss function and encodes additional robustness present in momentum methods, beyond the transient phase. 
    more » « less
  3. null (Ed.)
    Motivated by the fact that the gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs), here we derive an ODE representation of the accelerated triple momentum (TM) algorithm. For unconstrained optimization problems with strongly convex cost, the TM algorithm has a proven faster convergence rate than the Nesterov's accelerated gradient (NAG) method but with the same computational complexity. We show that similar to the NAG method, in order to accurately capture the characteristics of the TM method, we need to use a high-resolution modeling to obtain the ODE representation of the TM algorithm. We propose a Lyapunov analysis to investigate the stability and convergence behavior of the proposed high-resolution ODE representation of the TM algorithm. We compare the rate of the ODE representation of the TM method with that of the NAG method to confirm its faster convergence. Our study also leads to a tighter bound on the worst rate of convergence for the ODE model of the NAG method. In this paper, we also discuss the use of the integral quadratic constraint (IQC) method to establish an estimate on the rate of convergence of the TM algorithm. A numerical example verifies our results. 
    more » « less
  4. Abstract

    Convergence analysis of accelerated first-order methods for convex optimization problems are developed from the point of view of ordinary differential equation solvers. A new dynamical system, called Nesterov accelerated gradient (NAG) flow, is derived from the connection between acceleration mechanism andA-stability of ODE solvers, and the exponential decay of a tailored Lyapunov function along with the solution trajectory is proved. Numerical discretizations of NAG flow are then considered and convergence rates are established via a discrete Lyapunov function. The proposed differential equation solver approach can not only cover existing accelerated methods, such as FISTA, Güler’s proximal algorithm and Nesterov’s accelerated gradient method, but also produce new algorithms for composite convex optimization that possess accelerated convergence rates. Both the convex and the strongly convex cases are handled in a unified way in our approach.

     
    more » « less
  5. Abstract

    Phenotypic plasticity of traits is commonly measured in plants to improve understanding of organismal and ecosystem responses to climate change but is far less studied for microbes. Specifically, decomposer fungi are thought to display high levels of phenotypic plasticity and their functions have important implications for ecosystem dynamics. Assessing the phenotypic plasticity of fungal traits may therefore be important for predicting fungal community response to climate change. Here, we assess the phenotypic plasticity of 15 fungal isolates (12 species) from a Southern California grassland. Fungi were incubated on litter at five moisture levels (ranging from 4–50% water holding capacity) and at five temperatures (ranging from 4–36 °C). After incubation, fungal biomass and activities of four extracellular enzymes (cellobiohydrolase (CBH), β-glucosidase (BG), β-xylosidase (BX), and N-acetyl-β-D-glucosaminidase (NAG)) were measured. We used response surface methodology to determine how fungal phenotypic plasticity differs across the moisture-temperature gradient. We hypothesized that fungal biomass and extracellular enzyme activities would vary with moisture and temperature and that the shape of the response surface would vary between fungal isolates. We further hypothesized that more closely related fungi would show more similar response surfaces across the moisture-temperature gradient. In support of our hypotheses, we found that plasticity differed between fungi along the temperature gradient for fungal biomass and for all the extracellular enzyme activities. Plasticity also differed between fungi along the moisture gradient for BG activity. These differences appear to be caused by variation mainly at the moisture and temperature extremes. We also found that more closely related fungi had more similar extracellular enzymes activities at the highest temperature. Altogether, this evidence suggests that with global warming, fungal biodiversity may become increasingly important as functional traits tend to diverge along phylogenetic lines at higher temperatures.

     
    more » « less