skip to main content


This content will become publicly available on August 16, 2024

Title: A Superlinearly Convergent Subgradient Method for Sharp Semismooth Problems

Subgradient methods comprise a fundamental class of nonsmooth optimization algorithms. Classical results show that certain subgradient methods converge sublinearly for general Lipschitz convex functions and converge linearly for convex functions that grow sharply away from solutions. Recent work has moreover extended these results to certain nonconvex problems. In this work, we seek to improve the complexity of these algorithms by asking the following question. Is it possible to design a superlinearly convergent subgradient method? We provide a positive answer to this question for a broad class of sharp semismooth functions.

Funding: The research of D. Davis was supported by the Division of Mathematical Sciences [Grant 2047637] and the Alfred P. Sloan Foundation.

 
more » « less
Award ID(s):
2047637
NSF-PAR ID:
10485557
Author(s) / Creator(s):
;
Publisher / Repository:
Mathematics of Operations Research
Date Published:
Journal Name:
Mathematics of Operations Research
ISSN:
0364-765X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This work considers the minimization of a general convex function f (X) over the cone of positive semi-definite matrices whose optimal solution X* is of low-rank. Standard first-order convex solvers require performing an eigenvalue decomposition in each iteration, severely limiting their scalability. A natural nonconvex reformulation of the problem factors the variable X into the product of a rectangular matrix with fewer columns and its transpose. For a special class of matrix sensing and completion problems with quadratic objective functions, local search algorithms applied to the factored problem have been shown to be much more efficient and, in spite of being nonconvex, to converge to the global optimum. The purpose of this work is to extend this line of study to general convex objective functions f (X) and investigate the geometry of the resulting factored formulations. Specifically, we prove that when f (X) satisfies the restricted well-conditioned assumption, each critical point of the factored problem either corresponds to the optimal solution X* or a strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such a geometric structure of the factored formulation ensures that many local search algorithms can converge to the global optimum with random initializations. 
    more » « less
  2. This paper proposes and develops a new Newton-type algorithm to solve subdifferential inclusions defined by subgradients of extended real-valued prox-regular functions. The proposed algorithm is formulated in terms of the second order subdifferential of such functions that enjoy extensive calculus rules and can be efficiently computed for broad classes of extended real-valued functions. Based on this and on the metric regularity and subregularity properties of subgradient mappings, we establish verifiable conditions ensuring the well-posedness of the proposed algorithm and its local superlinear convergence. The obtained results are also new for the class of equations defined by continuously differentiable functions with Lipschitzian gradients ([Formula: see text] functions), which is the underlying case of our consideration. The developed algorithms for prox-regular functions and their extension to a structured class of composite functions are formulated in terms of proximal mappings and forward–backward envelopes. Besides numerous illustrative examples and comparison with known algorithms for [Formula: see text] functions and generalized equations, the paper presents applications of the proposed algorithms to regularized least square problems arising in statistics, machine learning, and related disciplines. Funding: Research of P. D. Khanh is funded by Ho Chi Minh City University of Education Foundation for Science and Technology [Grant CS.2022.19.20TD]. Research of B. Mordukhovich and V. T. Phat was partly supported by the U.S. National Science Foundation [Grants DMS-1808978 and DMS-2204519]. The research of B. Mordukhovich was also supported by the Air Force Office of Scientific Research [Grant 15RT0462] and the Australian Research Council under Discovery Project DP-190100555. This work was supported by the Air Force Office of Scientific Research [Grant 15RT0462]. 
    more » « less
  3. This work bounds extreme values of state functions for a class of input-affine continuous-time systems that are affected by polyhedral-bounded uncertainty. Instances of these systems may arise in data-driven peak estimation, in which the state function must be bounded for all systems that are consistent with a set of state-derivative data records corrupted under L-infinity bounded noise. Existing occupation measure-based methods form a convergent sequence of outer approximations to the true peak value, given an initial set, by solving a hierarchy of semidefinite programs in increasing size. These techniques scale combinatorially in the number of state variables and uncertain parameters. We present tractable algorithms for peak estimation that scale linearly in the number of faces of the uncertainty-bounding polytope rather than combinatorially in the number of uncertain parameters by leveraging convex duality and a theorem of alternatives (facial decomposition). The sequence of decomposed semidefinite programs will converge to the true peak value under mild assumptions (convergence and smoothness of dynamics). 
    more » « less
  4. Abstract Quantized or low-bit neural networks are attractive due to their inference efficiency. However, training deep neural networks with quantized activations involves minimizing a discontinuous and piecewise constant loss function. Such a loss function has zero gradient almost everywhere (a.e.), which makes the conventional gradient-based algorithms inapplicable. To this end, we study a novel class of biased first-order oracle, termed coarse gradient, for overcoming the vanished gradient issue. A coarse gradient is generated by replacing the a.e. zero derivative of quantized (i.e., staircase) ReLU activation composited in the chain rule with some heuristic proxy derivative called straight-through estimator (STE). Although having been widely used in training quantized networks empirically, fundamental questions like when and why the ad hoc STE trick works, still lack theoretical understanding. In this paper, we propose a class of STEs with certain monotonicity and consider their applications to the training of a two-linear-layer network with quantized activation functions for nonlinear multi-category classification. We establish performance guarantees for the proposed STEs by showing that the corresponding coarse gradient methods converge to the global minimum, which leads to a perfect classification. Lastly, we present experimental results on synthetic data as well as MNIST dataset to verify our theoretical findings and demonstrate the effectiveness of our proposed STEs. 
    more » « less
  5. Abstract

    In this paper, we study and prove the non-asymptotic superlinear convergence rate of the Broyden class of quasi-Newton algorithms which includes the Davidon–Fletcher–Powell (DFP) method and the Broyden–Fletcher–Goldfarb–Shanno (BFGS) method. The asymptotic superlinear convergence rate of these quasi-Newton methods has been extensively studied in the literature, but their explicit finite–time local convergence rate is not fully investigated. In this paper, we provide a finite–time (non-asymptotic) convergence analysis for Broyden quasi-Newton algorithms under the assumptions that the objective function is strongly convex, its gradient is Lipschitz continuous, and its Hessian is Lipschitz continuous at the optimal solution. We show that in a local neighborhood of the optimal solution, the iterates generated by both DFP and BFGS converge to the optimal solution at a superlinear rate of$$(1/k)^{k/2}$$(1/k)k/2, wherekis the number of iterations. We also prove a similar local superlinear convergence result holds for the case that the objective function is self-concordant. Numerical experiments on several datasets confirm our explicit convergence rate bounds. Our theoretical guarantee is one of the first results that provide a non-asymptotic superlinear convergence rate for quasi-Newton methods.

     
    more » « less