skip to main content


This content will become publicly available on July 14, 2024

Title: The slope robustly determines convex functions
We show that the deviation between the slopes of two convex functions controls the deviation between the functions themselves. This result reveals that the slope—a one dimensional construct—robustly determines convex functions, up to a constant of integration.  more » « less
Award ID(s):
2023166
NSF-PAR ID:
10435356
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the American Mathematical Society
ISSN:
0002-9939
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Accurate estimation of tail probabilities of projections of high-dimensional probability measures is of relevance in high-dimensional statistics and asymptotic geometric analysis. Whereas large deviation principles identify the asymptotic exponential decay rate of probabilities, sharp large deviation estimates also provide the “prefactor” in front of the exponentially decaying term. For fixed p ∈ (1, ∞), consider independent sequences (X(n,p))_{n∈N} and (Θ_n)_{n∈N} of random vectors with Θn distributed according to the normalized cone measure on the unit l^n_2 sphere, and X(n,p) distributed according to the normalized cone measure on the unit lnp sphere. For almost every realization (θn)_{n∈N} of (Θ_n)_{n∈N}, (quenched) sharp large deviation estimates are established for suitably normalized (scalar) projections of X(n,p) onto θ_n, that are asymptotically exact (as the dimension n tends to infinity). Furthermore, the case when (X(n,p))_{n∈N} is replaced with (X(n,p))_{n∈N}, where X(n,p) is distributed according to the uniform (or normalized volume) measure on the unit lnp ball, is also considered. In both cases, in contrast to the (quenched) large deviation rate function, the prefactor exhibits a dependence on the projection directions (θ_n)_{n∈N} that encodes additional geometric information that enables one to distinguish between projections of balls and spheres. Moreover, comparison with numerical estimates obtained by direct computation and importance sampling shows that the obtained analytical expressions for tail probabilities provide good approximations even for moderate values of n. The results on the one hand provide more accurate quantitative estimates of tail probabilities of random projections of \ell^n_p spheres than logarithmic asymptotics, and on the other hand, generalize classical sharp large deviation estimates in the spirit of Bahadur and Ranga Rao to a geometric setting. The proofs combine Fourier analytic and probabilistic techniques. Along the way, several results of independent interest are obtained including a simpler representation for the quenched large deviation rate function that shows that it is strictly convex, a central limit theorem for random projections under a certain family of tilted measures, and multidimensional generalized Laplace asymptotics. 
    more » « less
  2. Switching between finitely many continuous-time autonomous steepest descent dynamics for convex functions is considered. Convergence of complete solutions to common minimizers of the convex functions, if such minimizers exist, is shown. The convex functions need not be smooth and may be subject to constraints. Since the common minimizers may represent consensus in a multi-agent system modeled by an undirected communication graph, several known results about asymptotic consensus are deduced as special cases. Extensions to time-varying convex functions and to dynamics given by set-valued mappings more general than subdifferentials of convex functions are included. 
    more » « less
  3. We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our heterogeneous data setting where workers compute stochastic gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly-convex and non-convex objectives and show that our convergence rates match that of vanilla SGD in the Byzantine-free setting. In order to bound the heterogeneity, we assume that the gradients at different workers have bounded deviation from each other, and we also provide concrete bounds on this deviation in the statistical heterogeneous data model. 
    more » « less
  4. We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine at- tacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our heterogeneous data setting where workers compute stochastic gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly- convex and non-convex objectives and show that our convergence rates match that of vanilla SGD in the Byzantine-free setting. In order to bound the heterogeneity, we assume that the gradients at different workers have bounded deviation from each other, and we also provide concrete bounds on this deviation in the statistical heterogeneous data model. 
    more » « less
  5. Abstract

    Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not distinguish between two fundamentally different algorithms—Nesterov’s accelerated gradient method for strongly convex functions (NAG-) and Polyak’s heavy-ball method—we study an alternative limiting process that yieldshigh-resolution ODEs. We show that these ODEs permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG- and Polyak’s heavy-ball method, but they allow the identification of a term that we refer to as “gradient correction” that is present in NAG- but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov’s accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result—that NAG- minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG- for smooth convex functions.

     
    more » « less