Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
We study the complexity of optimizing nonsmooth nonconvex Lipschitz functions by producing (δ, ǫ)-Goldstein stationary points. Several recent works have presented randomized algorithms that produce such points using eO(δ−1ǫ−3) first-order oracle calls, independent of the dimension d. It has been an open problem as to whether a similar result can be obtained via a deterministic algorithm. We resolve this open problem, showing that randomization is necessary to obtain a dimension-free rate. In particular, we prove a lower bound of (d) for any deterministic algorithm. Moreover, we show that unlike smooth or convex optimization, access to function values is required for any deterministic algorithm to halt within any finite time horizon. On the other hand, we prove that if the function is even slightly smooth, then the dimension-free rate of eO(δ−1ǫ−3) can be obtained by a deterministic algorithm with merely a logarithmic dependence on the smoothness parameter. Motivated by these findings, we turn to study the complexity of deterministically smoothing Lipschitz functions. Though there are well-known efficient black-box randomized smoothings, we start by showing that no such deterministic procedure can smooth functions in a meaningful manner (suitably defined), resolving an open question in the literature. We then bypass this impossibility result for the structured case of ReLU neural networks. To that end, in a practical “white-box” setting in which the optimizer is granted access to the network’s architecture, we propose a simple, dimension-free, deterministic smoothing of ReLU networks that provably preserves (δ, ǫ)-Goldstein stationary points. Our method applies to a variety of architectures of arbitrary depth, including ResNets and ConvNets. Combined with our algorithm for slightly-smooth functions, this yields the first deterministic, dimension-free algorithm for optimizing ReLU networks, circumventing our lower bound.more » « lessFree, publicly-accessible full text available July 17, 2024
Numerous applications in machine learning and data analytics can be formulated as equilibrium computation over Riemannian manifolds. Despite the extensive investigation of their Euclidean counterparts, the performance of Riemannian gradient-based algorithms remain opaque and poorly understood. We revisit the original scheme of Riemannian gradient descent (RGD) and analyze it under a geodesic monotonicity assumption, which includes the well-studied geodesically convex-concave min-max optimization problem as a special case. Our main contribution is to show that, despite the phenomenon of distance distortion, the RGD scheme, with a step size that is agnostic to the manifold’s curvature, achieves a curvature-independent and linear last-iterate convergence rate in the geodesically strongly monotone setting. To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence in the Riemannian setting has not been considered before.more » « lessFree, publicly-accessible full text available June 29, 2024
From optimal transport to robust dimensionality reduction, a plethora of machine learning applications can be cast into the min-max optimization problems over Riemannian manifolds. Though many min-max algorithms have been analyzed in the Euclidean setting, it has proved elusive to translate these results to the Riemannian case. Zhang et al.  have recently shown that geodesic convex concave Riemannian problems always admit saddle-point solutions. Inspired by this result, we study whether a performance gap between Riemannian and optimal Euclidean space convex-concave algorithms is necessary. We answer this question in the negative—we prove that the Riemannian corrected extragradient (RCEG) method achieves last-iterate convergence at a linear rate in the geodesically strongly-convex-concave case, matching the Euclidean result. Our results also extend to the stochastic or non-smooth case where RCEG and Riemanian gradient ascent descent (RGDA) achieve near-optimal convergence rates up to factors depending on curvature of the manifold.more » « lessFree, publicly-accessible full text available April 3, 2024
We provide a unifying framework for the design and analysis of multi-calibrated and moment- multi-calibrated predictors. Placing the multi-calibration problem in the general setting of multiobjective learning—where learning guarantees must hold simultaneously over a set of distribu- tions and loss functions—we exploit connections to game dynamics to obtain state-of-the-art guarantees for a diverse set of multi-calibration learning problems. In addition to shedding light on existing multi-calibration guarantees, and greatly simplifying their analysis, our ap- proach yields a 1/ε2 improvement in the number of oracle calls compared to the state-of-the-art algorithm of Jung et al.  for learning deterministic moment-calibrated predictors and an exponential improvement in k compared to the state-of-the-art algorithm of Gopalan et al.  for learning a k-class multi-calibrated predictor. Beyond multi-calibration, we use these game dynamics to address existing and emerging considerations in the study of group fairness and multi-distribution learning.more » « less