- PAR ID:
- 10358053
- Date Published:
- Journal Name:
- Mathematics of Operations Research
- ISSN:
- 0364-765X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Riemannian Stochastic Proximal Gradient Methods for Nonsmooth Optimization over the Stiefel ManifoldRiemannian optimization has drawn a lot of attention due to its wide applications in practice. Riemannian stochastic first-order algorithms have been studied in the literature to solve large-scale machine learning problems over Riemannian manifolds. However, most of the existing Riemannian stochastic algorithms require the objective function to be differentiable, and they do not apply to the case where the objective function is nonsmooth. In this paper, we present two Riemannian stochastic proximal gradient methods for minimizing nonsmooth function over the Stiefel manifold. The two methods, named R-ProxSGD and R-ProxSPB, are generalizations of proximal SGD and proximal SpiderBoost in Euclidean setting to the Riemannian setting. Analysis on the incremental first-order oracle (IFO) complexity of the proposed algorithms is provided. Specifically, the R-ProxSPB algorithm finds an ϵ -stationary point with O(ϵ−3) IFOs in the online case, and O(n+n‾√ϵ−2) IFOs in the finite-sum case with n being the number of summands in the objective. Experimental results on online sparse PCA and robust low-rank matrix completion show that our proposed methods significantly outperform the existing methods that use Riemannian subgradient information.more » « less
-
In this paper, we study the generalized subdifferentials and the Riemannian gradient subconsistency that are the basis for non-Lipschitz optimization on embedded submanifolds of [Formula: see text]. We then propose a Riemannian smoothing steepest descent method for non-Lipschitz optimization on complete embedded submanifolds of [Formula: see text]. We prove that any accumulation point of the sequence generated by the Riemannian smoothing steepest descent method is a stationary point associated with the smoothing function employed in the method, which is necessary for the local optimality of the original non-Lipschitz problem. We also prove that any accumulation point of the sequence generated by our method that satisfies the Riemannian gradient subconsistency is a limiting stationary point of the original non-Lipschitz problem. Numerical experiments are conducted to demonstrate the advantages of Riemannian [Formula: see text] [Formula: see text] optimization over Riemannian [Formula: see text] optimization for finding sparse solutions and the effectiveness of the proposed method.
Funding: C. Zhang was supported in part by the National Natural Science Foundation of China [Grant 12171027] and the Natural Science Foundation of Beijing [Grant 1202021]. X. Chen was supported in part by the Hong Kong Research Council [Grant PolyU15300219]. S. Ma was supported in part by the National Science Foundation [Grants DMS-2243650 and CCF-2308597], the UC Davis Center for Data Science and Artificial Intelligence Research Innovative Data Science Seed Funding Program, and a startup fund from Rice University.
-
Functionally constrained stochastic optimization problems, where neither the objective function nor the constraint functions are analytically available, arise frequently in machine learning applications. In this work, assuming we only have access to the noisy evaluations of the objective and constraint functions, we propose and analyze stochastic zeroth-order algorithms for solving this class of stochastic optimization problem. When the domain of the functions is [Formula: see text], assuming there are m constraint functions, we establish oracle complexities of order [Formula: see text] and [Formula: see text] in the convex and nonconvex settings, respectively, where ϵ represents the accuracy of the solutions required in appropriately defined metrics. The established oracle complexities are, to our knowledge, the first such results in the literature for functionally constrained stochastic zeroth-order optimization problems. We demonstrate the applicability of our algorithms by illustrating their superior performance on the problem of hyperparameter tuning for sampling algorithms and neural network training. Funding: K. Balasubramanian was partially supported by a seed grant from the Center for Data Science and Artificial Intelligence Research, University of California–Davis, and the National Science Foundation [Grant DMS-2053918].more » « less
-
We study the impact of nonconvexity on the complexity of nonsmooth optimization, emphasizing objectives such as piecewise linear functions, which may not be weakly convex. We focus on a dimension-independent analysis, slightly modifying a 2020 black-box algorithm of Zhang-Lin-Jegelka-Sra-Jadbabaie that approximates an ϵ-stationary point of any directionally differentiable Lipschitz objective using [Formula: see text] calls to a specialized subgradient oracle and a randomized line search. Seeking by contrast a deterministic method, we present a simple black-box version that achieves [Formula: see text] for any difference-of-convex objective and [Formula: see text] for the weakly convex case. Our complexity bound depends on a natural nonconvexity modulus that is related, intriguingly, to the negative part of directional second derivatives of the objective, understood in the distributional sense.
Funding: This work was supported by the National Science Foundation [Grant DMS-2006990].
-
null (Ed.)We present a stochastic descent algorithm for unconstrained optimization that is particularly efficient when the objective function is slow to evaluate and gradients are not easily obtained, as in some PDE-constrained optimization and machine learning problems. The algorithm maps the gradient onto a low-dimensional ran- dom subspace of dimension at each iteration, similar to coordinate descent but without restricting directional derivatives to be along the axes. Without requiring a full gradient, this mapping can be performed by computing directional deriva- tives (e.g., via forward-mode automatic differentiation). We give proofs for conver- gence in expectation under various convexity assumptions as well as probabilistic convergence results under strong-convexity. Our method provides a novel extension to the well-known Gaussian smoothing technique to descent in subspaces of dimen- sion greater than one, opening the doors to new analysis of Gaussian smoothing when more than one directional derivative is used at each iteration. We also provide a finite-dimensional variant of a special case of the Johnson–Lindenstrauss lemma. Experimentally, we show that our method compares favorably to coordinate descent, Gaussian smoothing, gradient descent and BFGS (when gradients are calculated via forward-mode automatic differentiation) on problems from the machine learning and shape optimization literature.more » « less