skip to main content

This content will become publicly available on October 1, 2024

Title: Convergence analysis of a block preconditioned steepest descent eigensolver with implicit deflation

Gradient‐type iterative methods for solving Hermitian eigenvalue problems can be accelerated by using preconditioning and deflation techniques. A preconditioned steepest descent iteration with implicit deflation (PSD‐id) is one of such methods. The convergence behavior of the PSD‐id is recently investigated based on the pioneering work of Samokish on the preconditioned steepest descent method (PSD). The resulting non‐asymptotic estimates indicate a superlinear convergence of the PSD‐id under strong assumptions on the initial guess. The present paper utilizes an alternative convergence analysis of the PSD by Neymeyr under much weaker assumptions. We embed Neymeyr's approach into the analysis of the PSD‐id using a restricted formulation of the PSD‐id. More importantly, we extend the new convergence analysis of the PSD‐id to a practically preferred block version of the PSD‐id, or BPSD‐id, and show the cluster robustness of the BPSD‐id. Numerical examples are provided to validate the theoretical estimates.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Date Published:
Journal Name:
Numerical Linear Algebra with Applications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract How to choose the step size of gradient descent method has been a popular subject of research. In this paper we propose a modified limited memory steepest descent method (MLMSD). In each iteration we propose a selection rule to pick a unique step size from a candidate set, which is calculated by Fletcher’s limited memory steepest descent method (LMSD), instead of going through all the step sizes in a sweep, as in Fletcher’s original LMSD algorithm. MLMSD is motivated by an inexact super-linear convergence rate analysis. The R-linear convergence of MLMSD is proved for a strictly convex quadratic minimization problem. Numerical tests are presented to show that our algorithm is efficient and robust. 
    more » « less
  2. null (Ed.)
    Stochastic Gradient Descent (SGD) and its variants are the most used algorithms in machine learning applications. In particular, SGD with adaptive learning rates and momentum is the industry standard to train deep networks. Despite the enormous success of these methods, our theoretical understanding of these variants in the non-convex setting is not complete, with most of the results only proving convergence in expectation and with strong assumptions on the stochastic gradients. In this paper, we present a high probability analysis for adaptive and momentum algorithms, under weak assumptions on the function, stochastic gradients, and learning rates. We use it to prove for the first time the convergence of the gradients to zero in high probability in the smooth nonconvex setting for Delayed AdaGrad with momentum. 
    more » « less
  3. We present a convergence rate analysis for biased stochastic gradient descent (SGD), where individual gradient updates are corrupted by computation errors. We develop stochastic quadratic constraints to formulate a small linear matrix inequality (LMI) whose feasible points lead to convergence bounds of biased SGD. Based on this LMI condition, we develop a sequential minimization approach to analyze the intricate trade-offs that couple stepsize selection, convergence rate, optimization accuracy, and robustness to gradient inaccuracy. We also provide feasible points for this LMI and obtain theoretical formulas that quantify the convergence properties of biased SGD under various assumptions on the loss functions. 
    more » « less
  4. null (Ed.)
    State-of-the-art seismic imaging techniques treat inversion tasks such as full-waveform inversion (FWI) and least-squares reverse time migration (LSRTM) as partial differential equation-constrained optimization problems. Due to the large-scale nature, gradient-based optimization algorithms are preferred in practice to update the model iteratively. Higher-order methods converge in fewer iterations but often require higher computational costs, more line-search steps, and bigger memory storage. A balance among these aspects has to be considered. We have conducted an evaluation using Anderson acceleration (AA), a popular strategy to speed up the convergence of fixed-point iterations, to accelerate the steepest-descent algorithm, which we innovatively treat as a fixed-point iteration. Independent of the unknown parameter dimensionality, the computational cost of implementing the method can be reduced to an extremely low dimensional least-squares problem. The cost can be further reduced by a low-rank update. We determine the theoretical connections and the differences between AA and other well-known optimization methods such as L-BFGS and the restarted generalized minimal residual method and compare their computational cost and memory requirements. Numerical examples of FWI and LSRTM applied to the Marmousi benchmark demonstrate the acceleration effects of AA. Compared with the steepest-descent method, AA can achieve faster convergence and can provide competitive results with some quasi-Newton methods, making it an attractive optimization strategy for seismic inversion. 
    more » « less
  5. We develop a progressive training approach for neural networks which adaptively grows the network structure by splitting existing neurons to multiple off-springs. By leveraging a functional steepest descent idea, we derive a simple criterion for deciding the best subset of neurons to split and a splitting gradient for optimally updating the off-springs. Theoretically, our splitting strategy is a second-order functional steepest descent for escaping saddle points in an infty-Wasserstein metric space, on which the standard parametric gradient descent is a first-order steepest descent. Our method provides a new computationally efficient approach for optimizing neural network structures, especially for learning lightweight neural architectures in resource-constrained settings. 
    more » « less