We consider stochastic zeroth-order optimization over Riemannian submanifolds embedded in Euclidean space, where the task is to solve Riemannian optimization problems with only noisy objective function evaluations. Toward this, our main contribution is to propose estimators of the Riemannian gradient and Hessian from noisy objective function evaluations, based on a Riemannian version of the Gaussian smoothing technique. The proposed estimators overcome the difficulty of nonlinearity of the manifold constraint and issues that arise in using Euclidean Gaussian smoothing techniques when the function is defined only over the manifold. We use the proposed estimators to solve Riemannian optimization problems in the following settings for the objective function: (i) stochastic and gradient-Lipschitz (in both nonconvex and geodesic convex settings), (ii) sum of gradient-Lipschitz and nonsmooth functions, and (iii) Hessian-Lipschitz. For these settings, we analyze the oracle complexity of our algorithms to obtain appropriately defined notions of ϵ-stationary point or ϵ-approximate local minimizer. Notably, our complexities are independent of the dimension of the ambient Euclidean space and depend only on the intrinsic dimension of the manifold under consideration. We demonstrate the applicability of our algorithms by simulation results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.
more »
« less
Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel Manifold
Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an ϵ -stationary point with O(ϵ−3) IFOs in the online case, and O(n+n‾√ϵ−2) IFOs in the finite-sum case with n being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that use Riemannian subgradient information.
more »
« less
- Award ID(s):
- 1934568
- PAR ID:
- 10466276
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 23
- Issue:
- 106
- ISSN:
- 1532-4435
- Page Range / eLocation ID:
- 1-33
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider a class of Riemannian optimization problems where the objective is the sum of a smooth function and a nonsmooth function considered in the ambient space. This class of problems finds important applications in machine learning and statistics, such as sparse principal component analysis, sparse spectral clustering, and orthogonal dictionary learning. We propose a Riemannian alternating direction method of multipliers (ADMM) to solve this class of problems. Our algorithm adopts easily computable steps in each iteration. The iteration complexity of the proposed algorithm for obtaining an ϵ-stationary point is analyzed under mild assumptions. Existing ADMMs for solving nonconvex problems either do not allow a nonconvex constraint set or do not allow a nonsmooth objective function. Our algorithm is the first ADMM-type algorithm that minimizes a nonsmooth objective over manifold—a particular nonconvex set. Numerical experiments are conducted to demonstrate the advantage of the proposed method. Funding: The research of S. Ma was supported in part by the Office of Naval Research [Grant N00014-24-1-2705]; the National Science Foundation [Grants DMS-2243650, CCF-2308597, CCF-2311275, and ECCS-2326591]; the University of California, Davis Center for Data Science and Artificial Intelligence Research Innovative Data Science Seed Funding Program; and Rice University start-up fund.more » « less
-
Nonconvex and nonsmooth problems have recently attracted considerable attention in machine learning. However, developing efficient methods for the nonconvex and nonsmooth optimization problems with certain performance guarantee remains a challenge. Proximal coordinate descent (PCD) has been widely used for solving optimization problems, but the knowledge of PCD methods in the nonconvex setting is very limited. On the other hand, the asynchronous proximal coordinate descent (APCD) recently have received much attention in order to solve large-scale problems. However, the accelerated variants of APCD algorithms are rarely studied. In this project, we extend APCD method to the accelerated algorithm (AAPCD) for nonsmooth and nonconvex problems that satisfies the sufficient descent property, by comparing between the function values at proximal update and a linear extrapolated point using a delay-aware momentum value. To the best of our knowledge, we are the first to provide stochastic and deterministic accelerated extension of APCD algorithms for general nonconvex and nonsmooth problems ensuring that for both bounded delays and unbounded delays every limit point is a critical point. By leveraging Kurdyka-Łojasiewicz property, we will show linear and sublinear convergence rates for the deterministic AAPCD with bounded delays. Numerical results demonstrate the practical efficiency of our algorithm in speed.more » « less
-
arXiv:2311.11180v1 (Ed.)This paper presents a subgradient-based algorithm for constrained nonsmooth convex optimization that does not require projections onto the feasible set. While the well-established Frank-Wolfe algorithm and its variants already avoid projections, they are primarily designed for smooth objective functions. In con- trast, our proposed algorithm can handle nonsmooth problems with general convex functional inequality constraints. It achieves an ϵ-suboptimal solution in O(ϵ^−2) iterations, with each iteration requiring only a single (potentially inexact) Linear Minimization Oracle (LMO) call and a (possibly inexact) subgra- dient computation. This performance is consistent with existing lower bounds. Similar performance is observed when deterministic subgradients are replaced with stochastic subgradients. In the special case where there are no functional inequality constraints, our algorithm competes favorably with a recent nonsmooth projection-free method designed for constraint-free problems. Our approach uti- lizes a simple separation scheme in conjunction with a new Lagrange multiplier update rule.more » « less
-
null (Ed.)We initiate the study of quantum algorithms for escaping from saddle points with provable guarantee. Given a function f : R n → R , our quantum algorithm outputs an ϵ -approximate second-order stationary point using O ~ ( log 2 ( n ) / ϵ 1.75 ) queries to the quantum evaluation oracle (i.e., the zeroth-order oracle). Compared to the classical state-of-the-art algorithm by Jin et al. with O ~ ( log 6 ( n ) / ϵ 1.75 ) queries to the gradient oracle (i.e., the first-order oracle), our quantum algorithm is polynomially better in terms of log n and matches its complexity in terms of 1 / ϵ . Technically, our main contribution is the idea of replacing the classical perturbations in gradient descent methods by simulating quantum wave equations, which constitutes the improvement in the quantum query complexity with log n factors for escaping from saddle points. We also show how to use a quantum gradient computation algorithm due to Jordan to replace the classical gradient queries by quantum evaluation queries with the same complexity. Finally, we also perform numerical experiments that support our theoretical findings.more » « less
An official website of the United States government

