skip to main content


Title: Riemannian Optimization via Frank-Wolfe Methods
Abstract

We study projection-free methods for constrained Riemannian optimization. In particular, we propose a Riemannian Frank-Wolfe (RFW) method that handles constraints directly, in contrast to prior methods that rely on (potentially costly) projections. We analyze non-asymptotic convergence rates ofRFWto an optimum for geodesically convex problems, and to a critical point for nonconvex objectives. We also present a practical setting under whichRFWcan attain a linear convergence rate. As a concrete example, we specializeRFWto the manifold of positive definite matrices and apply it to two tasks: (i) computing the matrix geometric mean (Riemannian centroid); and (ii) computing the Bures-Wasserstein barycenter. Both tasks involve geodesically convex interval constraints, for which we show that the Riemannian “linear” oracle required byRFWadmits a closed form solution; this result may be of independent interest. We complement our theoretical results with an empirical comparison ofRFWagainst state-of-the-art Riemannian optimization methods, and observe thatRFWperforms competitively on the task of computing Riemannian centroids.

 
more » « less
NSF-PAR ID:
10369004
Author(s) / Creator(s):
;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Mathematical Programming
Volume:
199
Issue:
1-2
ISSN:
0025-5610
Page Range / eLocation ID:
p. 525-556
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We study stochastic projection-free methods for constrained optimization of smooth functions on Riemannian manifolds, i.e., with additional constraints beyond the parameter domain being a manifold. Specifically, we introduce stochastic Riemannian Frank–Wolfe (Fw) methods for nonconvex and geodesically convex problems. We present algorithms for both purely stochastic optimization and finite-sum problems. For the latter, we develop variance-reduced methods, including a Riemannian adaptation of the recently proposed Spider technique. For all settings, we recover convergence rates that are comparable to the best-known rates for their Euclidean counterparts. Finally, we discuss applications to two classic tasks: the computation of the Karcher mean of positive definite matrices and Wasserstein barycenters for multivariate normal distributions. For both tasks, stochastic Fw methods yield state-of-the-art empirical performance. 
    more » « less
  2. Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case. Our main contribution is to show that, despite the phenomenon of distance distortion, the RGD scheme, with a step size that is agnostic to the manifold’s curvature, achieves a curvature-independent and linear last-iterate convergence rate in the geodesically strongly monotone setting. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence in the Riemannian setting has not been considered before. 
    more » « less
  3. From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al. [2022] have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative—we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold. 
    more » « less
  4. In this paper, we develop a distributed consensus algorithm for agents whose states evolve on a manifold. This algorithm is complementary to traditional consensus, predominantly developed for systems with dynamics on vector spaces. We provide theoretical convergence guarantees for the proposed manifold consensus provided that agents are initialized within a geodesically convex (g-convex) set. This required condition on initialization is not restrictive as g-convex sets may be comparatively “large” for relevant Riemannian manifolds. Our approach to manifold consensus builds upon the notion of Riemannian Center of Mass (RCM) and the intrinsic structure of the manifold to avoid projections in the ambient space. We first show that on a g-convex ball, all states coincide if and only if each agent’s state is the RCM of its neighbors’ states. This observation facilitates our convergence guarantee to the consensus submanifold. Finally, we provide simulation results that exemplify the linear convergence rate of the proposed algorithm and illustrates its statistical properties over randomly generated problem instances. 
    more » « less
  5. null (Ed.)
    Abstract In this paper we consider large-scale smooth optimization problems with multiple linear coupled constraints. Due to the non-separability of the constraints, arbitrary random sketching would not be guaranteed to work. Thus, we first investigate necessary and sufficient conditions for the sketch sampling to have well-defined algorithms. Based on these sampling conditions we develop new sketch descent methods for solving general smooth linearly constrained problems, in particular, random sketch descent (RSD) and accelerated random sketch descent (A-RSD) methods. To our knowledge, this is the first convergence analysis of RSD algorithms for optimization problems with multiple non-separable linear constraints. For the general case, when the objective function is smooth and non-convex, we prove for the non-accelerated variant sublinear rate in expectation for an appropriate optimality measure. In the smooth convex case, we derive for both algorithms, non-accelerated and A-RSD, sublinear convergence rates in the expected values of the objective function. Additionally, if the objective function satisfies a strong convexity type condition, both algorithms converge linearly in expectation. In special cases, where complexity bounds are known for some particular sketching algorithms, such as coordinate descent methods for optimization problems with a single linear coupled constraint, our theory recovers the best known bounds. Finally, we present several numerical examples to illustrate the performances of our new algorithms. 
    more » « less