skip to main content


Title: Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction
In this paper, we study the finite-sum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated Extra-Gradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta}{\epsilon}}\right)$ and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta\log\beta}{\epsilon}}\right)$ gradient evaluations to attain an $\mathcal{O}(\epsilon)$-suboptimal solution, where $n$ is the number of functions in the finite sum and $\beta$ is the smoothness parameter. This result matches the best-known convergence rate of non-adaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on real-world datasets.  more » « less
Award ID(s):
1908510 1750333
NSF-PAR ID:
10353901
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Machine Learning
Page Range / eLocation ID:
13947-13994
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Techniques for reducing the variance of gradient estimates used in stochastic programming algorithms for convex finite-sum problems have received a great deal of attention in recent years. By leveraging dissipativity theory from control, we provide a new perspective on two important variance-reduction algorithms: SVRG and its direct accelerated variant Katyusha. Our perspective provides a physically intuitive understanding of the behavior of SVRG-like methods via a principle of energy conservation. The tools discussed here allow us to automate the convergence analysis of SVRG-like methods by capturing their essential properties in small semidefinite programs amenable to standard analysis and computational techniques. Our approach recovers existing convergence results for SVRG and Katyusha and generalizes the theory to alternative parameter choices. We also discuss how our approach complements the linear coupling technique. Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems. 
    more » « less
  2. An \ell _p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A , \[ \Pr _{\Pi }[\textrm {for all }x, \ \Vert Ax\Vert _p \le \Vert \Pi Ax\Vert _p \le \kappa \Vert Ax\Vert _p] \ge 9/10,\] where r is the dimension of the embedding, \kappa is the distortion of the embedding, and for an n -dimensional vector y , \Vert y\Vert _p = (\sum _{i=1}^n |y_i|^p)^{1/p} is the \ell _p -norm. Another important property is the sparsity of \Pi , that is, the maximum number of non-zero entries per column, as this determines the running time of computing \Pi A . While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of 1 \le p \lt 2 , much less was known. In this article, we obtain nearly optimal tradeoffs for \ell _1 oblivious subspace embeddings, as well as new tradeoffs for 1 \lt p \lt 2 . Our main results are as follows: (1) We show for every 1 \le p \lt 2 , any oblivious subspace embedding with dimension r has distortion \[ \kappa = \Omega \left(\frac{1}{\left(\frac{1}{d}\right)^{1 / p} \log ^{2 / p}r + \left(\frac{r}{n}\right)^{1 / p - 1 / 2}}\right).\] When r = {\operatorname{poly}}(d) \ll n in applications, this gives a \kappa = \Omega (d^{1/p}\log ^{-2/p} d) lower bound, and shows the oblivious subspace embedding of Sohler and Woodruff (STOC, 2011) for p = 1 is optimal up to {\operatorname{poly}}(\log (d)) factors. (2) We give sparse oblivious subspace embeddings for every 1 \le p \lt 2 . Importantly, for p = 1 , we achieve r = O(d \log d) , \kappa = O(d \log d) and s = O(\log d) non-zero entries per column. The best previous construction with s \le {\operatorname{poly}}(\log d) is due to Woodruff and Zhang (COLT, 2013), giving \kappa = \Omega (d^2 {\operatorname{poly}}(\log d)) or \kappa = \Omega (d^{3/2} \sqrt {\log n} \cdot {\operatorname{poly}}(\log d)) and r \ge d \cdot {\operatorname{poly}}(\log d) ; in contrast our r = O(d \log d) and \kappa = O(d \log d) are optimal up to {\operatorname{poly}}(\log (d)) factors even for dense matrices. We also give (1) \ell _p oblivious subspace embeddings with an expected 1+\varepsilon number of non-zero entries per column for arbitrarily small \varepsilon \gt 0 , and (2) the first oblivious subspace embeddings for 1 \le p \lt 2 with O(1) -distortion and dimension independent of n . Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise \ell _p low-rank approximation. Our results give improved algorithms for these applications. 
    more » « less
  3. We investigate the behavior of higher-form symmetries at variousquantum phase transitions. We consider discrete 1-form symmetries, whichcan be either part of the generalized concept “categorical symmetry”(labelled as \tilde{Z}_N^{(1)} Z ̃ N ( 1 ) )introduced recently, or an explicit Z_N^{(1)} Z N ( 1 ) 1-form symmetry. We demonstrate that for many quantum phase transitionsinvolving a Z_N^{(1)} Z N ( 1 ) or \tilde{Z}_N^{(1)} Z ̃ N ( 1 ) symmetry, the following expectation value \langle \left( O_\mathcal{C}\right)^2 \rangle ⟨ ( O 𝒞 ) 2 ⟩ takes the form \langle \left( \log O_\mathcal{C} \right)^2 \rangle \sim - \frac{A}{\epsilon} P + b \log P ⟨ ( log O 𝒞 ) 2 ⟩ ∼ − A ϵ P + b log P , where O_\mathcal{C} O 𝒞 is an operator defined associated with loop \mathcal{C} 𝒞 (or its interior \mathcal{A} 𝒜 ),which reduces to the Wilson loop operator for cases with an explicit Z_N^{(1)} Z N ( 1 ) 1-form symmetry. P P is the perimeter of \mathcal{C} 𝒞 ,and the b \log P b log P term arises from the sharp corners of the loop \mathcal{C} 𝒞 ,which is consistent with recent numerics on a particular example. b b is a universal microscopic-independent number, which in (2+1)d ( 2 + 1 ) d is related to the universal conductivity at the quantum phasetransition. b b can be computed exactly for certain transitions using the dualitiesbetween (2+1)d ( 2 + 1 ) d conformal field theories developed in recent years. We also compute the"strange correlator" of O_\mathcal{C} O 𝒞 : S_{\mathcal{C}} = \langle 0 | O_\mathcal{C} | 1 \rangle / \langle 0 | 1 \rangle S 𝒞 = ⟨ 0 | O 𝒞 | 1 ⟩ / ⟨ 0 | 1 ⟩ where |0\rangle | 0 ⟩ and |1\rangle | 1 ⟩ are many-body states with different topological nature. 
    more » « less
  4. A bstract We study the four-point function of the lowest-lying half-BPS operators in the $$ \mathcal{N} $$ N = 4 SU( N ) super-Yang-Mills theory and its relation to the flat-space four-graviton amplitude in type IIB superstring theory. We work in a large- N expansion in which the complexified Yang-Mills coupling τ is fixed. In this expansion, non-perturbative instanton contributions are present, and the SL(2 , ℤ) duality invariance of correlation functions is manifest. Our results are based on a detailed analysis of the sphere partition function of the mass-deformed SYM theory, which was previously computed using supersymmetric localization. This partition function determines a certain integrated correlator in the undeformed $$ \mathcal{N} $$ N = 4 SYM theory, which in turn constrains the four-point correlator at separated points. In a normalization where the two-point functions are proportional to N 2 − 1 and are independent of τ and $$ \overline{\tau} $$ τ ¯ , we find that the terms of order $$ \sqrt{N} $$ N and $$ 1/\sqrt{N} $$ 1 / N in the large N expansion of the four-point correlator are proportional to the non-holomorphic Eisenstein series $$ E\left(\frac{3}{2},\tau, \overline{\tau}\right) $$ E 3 2 τ τ ¯ and $$ E\left(\frac{5}{2},\tau, \overline{\tau}\right) $$ E 5 2 τ τ ¯ , respectively. In the flat space limit, these terms match the corresponding terms in the type IIB S-matrix arising from R 4 and D 4 R 4 contact inter-actions, which, for the R 4 case, represents a check of AdS/CFT at finite string coupling. Furthermore, we present striking evidence that these results generalize so that, at order $$ {N}^{\frac{1}{2}-m} $$ N 1 2 − m with integer m ≥ 0, the expansion of the integrated correlator we study is a linear sum of non-holomorphic Eisenstein series with half-integer index, which are manifestly SL(2 , ℤ) invariant. 
    more » « less
  5. Abstract We present a new elementary algorithm that takes $$ \textrm{time} \ \ O_\epsilon \left( x^{\frac{3}{5}} (\log x)^{\frac{8}{5}+\epsilon } \right) \ \ \textrm{and} \ \textrm{space} \ \ O\left( x^{\frac{3}{10}} (\log x)^{\frac{13}{10}} \right) $$ time O ϵ x 3 5 ( log x ) 8 5 + ϵ and space O x 3 10 ( log x ) 13 10 (measured bitwise) for computing $$M(x) = \sum _{n \le x} \mu (n),$$ M ( x ) = ∑ n ≤ x μ ( n ) , where $$\mu (n)$$ μ ( n ) is the Möbius function. This is the first improvement in the exponent of x for an elementary algorithm since 1985. We also show that it is possible to reduce space consumption to $$O(x^{1/5} (\log x)^{5/3})$$ O ( x 1 / 5 ( log x ) 5 / 3 ) by the use of (Helfgott in: Math Comput 89:333–350, 2020), at the cost of letting time rise to the order of $$x^{3/5} (\log x)^2 \log \log x$$ x 3 / 5 ( log x ) 2 log log x . 
    more » « less