skip to main content


Title: On the Derivation of Quasi-Newton Formulas for Optimization in Function Spaces
Newton's method is usually preferred when solving optimization problems due to its superior convergence properties compared to gradient-based or derivative-free optimization algorithms. However, deriving and computing second-order derivatives needed by Newton's method often is not trivial and, in some cases, not possible. In such cases quasi-Newton algorithms are a great alternative. In this paper, we provide a new derivation of well-known quasi-Newton formulas in an infinite-dimensional Hilbert space setting. It is known that quasi-Newton update formulas are solutions to certain variational problems over the space of symmetric matrices. In this paper, we formulate similar variational problems over the space of bounded symmetric operators in Hilbert spaces. By changing the constraints of the variational problem we obtain updates (for the Hessian and Hessian inverse) not only for the Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton method but also for Davidon--Fletcher--Powell (DFP), Symmetric Rank One (SR1), and Powell-Symmetric-Broyden (PSB). In addition, for an inverse problem governed by a partial differential equation (PDE), we derive DFP and BFGS ``structured" secant formulas that explicitly use the derivative of the regularization and only approximates the second derivative of the misfit term. We show numerical results that demonstrate the desired mesh-independence property and superior performance of the resulting quasi-Newton methods.  more » « less
Award ID(s):
1654311
NSF-PAR ID:
10184514
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Numerical Functional Analysis and Optimization
Volume:
41
Issue:
13
ISSN:
0163-0563
Page Range / eLocation ID:
1564–1587
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In this paper, we study and prove the non-asymptotic superlinear convergence rate of the Broyden class of quasi-Newton algorithms which includes the Davidon–Fletcher–Powell (DFP) method and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method. The asymptotic superlinear convergence rate of these quasi-Newton methods has been extensively studied in the literature, but their explicit finite–time local convergence rate is not fully investigated. In this paper, we provide a finite–time (non-asymptotic) convergence analysis for Broyden quasi-Newton algorithms under the assumptions that the objective function is strongly convex, its gradient is Lipschitz continuous, and its Hessian is Lipschitz continuous at the optimal solution. We show that in a local neighborhood of the optimal solution, the iterates generated by both DFP and BFGS converge to the optimal solution at a superlinear rate of$$(1/k)^{k/2}$$(1/k)k/2, wherekis the number of iterations. We also prove a similar local superlinear convergence result holds for the case that the objective function is self-concordant. Numerical experiments on several datasets confirm our explicit convergence rate bounds. Our theoretical guarantee is one of the first results that provide a non-asymptotic superlinear convergence rate for quasi-Newton methods.

     
    more » « less
  2. Many machine learning problems optimize an objective that must be measured with noise. The primary method is a first order stochastic gradient descent using one or more Monte Carlo (MC) samples at each step. There are settings where ill-conditioning makes second order methods such as limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) more effective. We study the use of randomized quasi-Monte Carlo (RQMC) sampling for such problems. When MC sampling has a root mean squared error (RMSE) of O(n−1/2) then RQMC has an RMSE of o(n−1/2) that can be close to O(n−3/2) in favorable settings. We prove that improved sampling accuracy translates directly to improved optimization. In our empirical investigations for variational Bayes, using RQMC with stochastic quasi-Newton method greatly speeds up the optimization, and sometimes finds a better parameter value than MC does. 
    more » « less
  3. Al-Baali, Mehiddin ; Purnama, Anton ; Grandinetti, Lucio (Ed.)
    Second order, Newton-like algorithms exhibit convergence properties superior to gradient-based or derivative-free optimization algorithms. However, deriving and computing second order derivatives--needed for the Hessian-vector product in a Krylov iteration for the Newton step--often is not trivial. Second order adjoints provide a systematic and efficient tool to derive second derivative infor- mation. In this paper, we consider equality constrained optimization problems in an infinite-dimensional setting. We phrase the optimization problem in a general Banach space framework and derive second order sensitivities and second order adjoints in a rigorous and general way. We apply the developed framework to a partial differential equation-constrained optimization problem. 
    more » « less
  4. The phase field method is becoming the de facto choice for the numerical analysis of complex problems that involve multiple initiating, propagating, interacting, branching and merging fractures. However, within the context of finite element modelling, the method requires a fine mesh in regions where fractures will propagate, in order to capture sharp variations in the phase field representing the fractured/damaged regions. This means that the method can become computationally expensive when the fracture propagation paths are not known a priori. This paper presents a 2D hp-adaptive discontinuous Galerkin finite element method for phase field fracture that includes a posteriori error estimators for both the elasticity and phase field equations, which drive mesh adaptivity for static and propagating fractures. This combination means that it is possible to be reliably and efficiently solve phase field fracture problems with arbitrary initial meshes, irrespective of the initial geometry or loading conditions. This ability is demonstrated on several example problems, which are solved using a light-BFGS (Broyden–Fletcher–Goldfarb–Shanno) quasi-Newton algorithm. The examples highlight the importance of driving mesh adaptivity using both the elasticity and phase field errors for physically meaningful, yet computationally tractable, results. They also reveal the importance of including p-refinement, which is typically not included in existing phase field literature. The above features provide a powerful and general tool for modelling fracture propagation with controlled errors and degree-of-freedom optimised meshes. 
    more » « less
  5. Abstract. We introduce a new software package called “icepack” for modeling the flow of glaciers and ice sheets.The icepack package is built on the finite element modeling library Firedrake, which uses the Unified Form Language (UFL), a domain-specific language embedded into Python for describing weak forms of partial differential equations.The diagnostic models in icepack are formulated through action principles that are specified in UFL.The components of each action functional can be substituted for different forms of the user's choosing, which makes it easy to experiment with the model physics.The action functional itself can be used to define a solver convergence criterion that is independent of the mesh and requires little tuning on the part of the user. Theicepack package includes the 2D shallow ice and shallow stream models.We have also defined a 3D hybrid model based on spectral semi-discretization of the Blatter–Pattyn equations.Finally, icepack includes a Gauss–Newton solver for inverse problems that runs substantially faster than the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method often used in the glaciological literature.The overall design philosophy of icepack is to be as usable as possible for a wide a swath of the glaciological community, including both experts and novices in computational science. 
    more » « less