skip to main content


Title: Stochastic Zeroth-Order Riemannian Derivative Estimation and Optimization
We consider stochastic zeroth-order optimization over Riemannian submanifolds embedded in Euclidean space, where the task is to solve Riemannian optimization problems with only noisy objective function evaluations. Toward this, our main contribution is to propose estimators of the Riemannian gradient and Hessian from noisy objective function evaluations, based on a Riemannian version of the Gaussian smoothing technique. The proposed estimators overcome the difficulty of nonlinearity of the manifold constraint and issues that arise in using Euclidean Gaussian smoothing techniques when the function is defined only over the manifold. We use the proposed estimators to solve Riemannian optimization problems in the following settings for the objective function: (i) stochastic and gradient-Lipschitz (in both nonconvex and geodesic convex settings), (ii) sum of gradient-Lipschitz and nonsmooth functions, and (iii) Hessian-Lipschitz. For these settings, we analyze the oracle complexity of our algorithms to obtain appropriately defined notions of ϵ-stationary point or ϵ-approximate local minimizer. Notably, our complexities are independent of the dimension of the ambient Euclidean space and depend only on the intrinsic dimension of the manifold under consideration. We demonstrate the applicability of our algorithms by simulation results and real-world applications on black-box stiffness control for robotics and black-box attacks to neural networks.  more » « less
Award ID(s):
2007797 1953210 2053918 1934568
NSF-PAR ID:
10358053
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Mathematics of Operations Research
ISSN:
0364-765X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Riemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an ϵ -stationary point with O(ϵ−3) IFOs in the online case, and O(n+n‾√ϵ−2) IFOs in the finite-sum case with n being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that use Riemannian subgradient information. 
    more » « less
  2. Functionally constrained stochastic optimization problems, where neither the objective function nor the constraint functions are analytically available, arise frequently in machine learning applications. In this work, assuming we only have access to the noisy evaluations of the objective and constraint functions, we propose and analyze stochastic zeroth-order algorithms for solving this class of stochastic optimization problem. When the domain of the functions is [Formula: see text], assuming there are m constraint functions, we establish oracle complexities of order [Formula: see text] and [Formula: see text] in the convex and nonconvex settings, respectively, where ϵ represents the accuracy of the solutions required in appropriately defined metrics. The established oracle complexities are, to our knowledge, the first such results in the literature for functionally constrained stochastic zeroth-order optimization problems. We demonstrate the applicability of our algorithms by illustrating their superior performance on the problem of hyperparameter tuning for sampling algorithms and neural network training. Funding: K. Balasubramanian was partially supported by a seed grant from the Center for Data Science and Artificial Intelligence Research, University of California–Davis, and the National Science Foundation [Grant DMS-2053918]. 
    more » « less
  3. null (Ed.)
    We present a stochastic descent algorithm for unconstrained optimization that is particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained optimization and machine learning problems. The algorithm maps the gradient onto a low-dimensional ran- dom subspace of dimension at each iteration, similar to coordinate descent but without restricting directional derivatives to be along the axes. Without requiring a full gradient, this mapping can be performed by computing directional deriva- tives (e.g., via forward-mode automatic differentiation). We give proofs for conver- gence in expectation under various convexity assumptions as well as probabilistic convergence results under strong-convexity. Our method provides a novel extension to the well-known Gaussian smoothing technique to descent in subspaces of dimen- sion greater than one, opening the doors to new analysis of Gaussian smoothing when more than one directional derivative is used at each iteration. We also provide a finite-dimensional variant of a special case of the Johnson–Lindenstrauss lemma. Experimentally, we show that our method compares favorably to coordinate descent, Gaussian smoothing, gradient descent and BFGS (when gradients are calculated via forward-mode automatic differentiation) on problems from the machine learning and shape optimization literature. 
    more » « less
  4. We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing (δ, ǫ)-Goldstein stationary points. Several recent works have presented randomized algorithms that produce such points using eO(δ−1ǫ−3) first-order oracle calls, independent of the dimension d. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of (d) for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time horizon. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of eO(δ−1ǫ−3) can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are well-known efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner (suitably defined), resolving an open question in the literature. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical “white-box” setting in which the optimizer is granted access to the network’s architecture, we propose a simple, dimension-free, deterministic smoothing of ReLU networks that provably preserves (δ, ǫ)-Goldstein stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm for slightly-smooth functions, this yields the first deterministic, dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound. 
    more » « less
  5. Given only a finite collection of points sampled from a Riemannian manifold embedded in a Euclidean space, in this paper we propose a new method to numerically solve elliptic and parabolic partial differential equations (PDEs) supplemented with boundary conditions. Since the construction of triangulations on unknown manifolds can be both difficult and expensive, both in terms of computational and data requirements, our goal is to solve these problems without a triangulation. Instead, we rely only on using the sample points to define quadrature formulas on the unknown manifold. Our main tool is the diffusion maps algorithm. We re-analyze this well-known method in a variational sense for manifolds with boundary. Our main result is that the variational diffusion maps graph Laplacian is a consistent estimator of the Dirichlet energy on the manifold. This improves upon previous results and provides a rigorous justification of the well-known relationship between diffusion maps and the Neumann eigenvalue problem. Moreover, using semigeodesic coordinates we derive the first uniform asymptotic expansion of the diffusion maps kernel integral operator for manifolds with boundary. This expansion relies on a novel lemma which relates the extrinsic Euclidean distance to the coordinate norm in a normal collar of the boundary. We then use a recently developed method of estimating the distance to boundary function (notice that the boundary location is assumed to be unknown) to construct a consistent estimator for boundary integrals. Finally, by combining these various estimators, we illustrate how to impose Dirichlet and Neumann conditions for some common PDEs based on the Laplacian. Several numerical examples illustrate our theoretical findings. 
    more » « less