skip to main content


Title: Convergence of Hyperbolic Neural Networks Under Riemannian Stochastic Gradient Descent
Abstract

We prove, under mild conditions, the convergence of a Riemannian gradient descent method for a hyperbolic neural network regression model, both in batch gradient descent and stochastic gradient descent. We also discuss a Riemannian version of the Adam algorithm. We show numerical simulations of these algorithms on various benchmarks.

 
more » « less
Award ID(s):
1952644 2151235 1924548
NSF-PAR ID:
10467789
Author(s) / Creator(s):
; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Communications on Applied Mathematics and Computation
ISSN:
2096-6385
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We present a direct (primal only) derivation of Mirror Descent as a “partial” discretization of gradient flow on a Riemannian manifold where the metric tensor is the Hessian of the Mirror Descent potential function. We contrast this discretization to Natural Gradient Descent, which is obtained by a “full” forward Euler discretization. This view helps shed light on the relationship between the methods and allows generalizing Mirror Descent to any Riemannian geometry in Rd, even when the metric tensor is not a Hessian, and thus there is no “dual.” 
    more » « less
  2. Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case. Our main contribution is to show that, despite the phenomenon of distance distortion, the RGD scheme, with a step size that is agnostic to the manifold’s curvature, achieves a curvature-independent and linear last-iterate convergence rate in the geodesically strongly monotone setting. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence in the Riemannian setting has not been considered before. 
    more » « less
  3. From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative—we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold. 
    more » « less
  4. Abstract

    We present a new feasible proximal gradient method for constrained optimization where both the objective and constraint functions are given by summation of a smooth, possibly nonconvex function and a convex simple function. The algorithm converts the original problem into a sequence of convex subproblems. Formulating those subproblems requires the evaluation of at most one gradient-value of the original objective and constraint functions. Either exact or approximate subproblems solutions can be computed efficiently in many cases. An important feature of the algorithm is the constraint level parameter. By carefully increasing this level for each subproblem, we provide a simple solution to overcome the challenge of bounding the Lagrangian multipliers and show that the algorithm follows a strictly feasible solution path till convergence to the stationary point. We develop a simple, proximal gradient descent type analysis, showing that the complexity bound of this new algorithm is comparable to gradient descent for the unconstrained setting which is new in the literature. Exploiting this new design and analysis technique, we extend our algorithms to some more challenging constrained optimization problems where (1) the objective is a stochastic or finite-sum function, and (2) structured nonsmooth functions replace smooth components of both objective and constraint functions. Complexity results for these problems also seem to be new in the literature. Finally, our method can also be applied to convex function constrained problems where we show complexities similar to the proximal gradient method.

     
    more » « less
  5. While matrix-covariate regression models have been studied in many existing works, classical statistical and computational methods for the analysis of the regression coefficient estimation are highly affected by high dimensional matrix-valued covariates. To address these issues, this paper proposes a framework of matrix-covariate regression models based on a low-rank constraint and an additional regularization term for structured signals, with considerations of models of both continuous and binary responses. We propose an efficient Riemannian-steepest-descent algorithm for regression coefficient estimation. We prove that the consistency of the proposed estimator is in the order of O(sqrt{r(q+m)+p}/sqrt{n}), where r is the rank, p x m is the dimension of the coefficient matrix and p is the dimension of the coefficient vector. When the rank r is small, this rate improves over O(sqrt{qm+p}/sqrt{n}), the consistency of the existing work (Li et al. in Electron J Stat 15:1909-1950, 2021) that does not apply a rank constraint. In addition, we prove that all accumulation points of the iterates have similar estimation errors asymptotically and substantially attaining the minimax rate. We validate the proposed method through a simulated dataset on two-dimensional shape images and two real datasets of brain signals and microscopic leucorrhea images. 
    more » « less