Abstract We present a new elementary algorithm that takes $$ \textrm{time} \ \ O_\epsilon \left( x^{\frac{3}{5}} (\log x)^{\frac{8}{5}+\epsilon } \right) \ \ \textrm{and} \ \textrm{space} \ \ O\left( x^{\frac{3}{10}} (\log x)^{\frac{13}{10}} \right) $$ time O ϵ x 3 5 ( log x ) 8 5 + ϵ and space O x 3 10 ( log x ) 13 10 (measured bitwise) for computing $$M(x) = \sum _{n \le x} \mu (n),$$ M ( x ) = ∑ n ≤ x μ ( n ) , where $$\mu (n)$$ μ ( n ) is the Möbius function. This is the first improvement in the exponent of x for an elementary algorithm since 1985. We also show that it is possible to reduce space consumption to $$O(x^{1/5} (\log x)^{5/3})$$ O ( x 1 / 5 ( log x ) 5 / 3 ) by the use of (Helfgott in: Math Comput 89:333–350, 2020), at the cost of letting time rise to the order of $$x^{3/5} (\log x)^2 \log \log x$$ x 3 / 5 ( log x ) 2 log log x .
more »
« less
Adaptive Accelerated (Extra)Gradient Methods with Variance Reduction
In this paper, we study the finitesum convex optimization problem focusing on the general convex case. Recently, the study of variance reduced (VR) methods and their accelerated variants has made exciting progress. However, the step size used in the existing VR algorithms typically depends on the smoothness parameter, which is often unknown and requires tuning in practice. To address this problem, we propose two novel adaptive VR algorithms: Adaptive Variance Reduced Accelerated ExtraGradient (AdaVRAE) and Adaptive Variance Reduced Accelerated Gradient (AdaVRAG). Our algorithms do not require knowledge of the smoothness parameter. AdaVRAE uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta}{\epsilon}}\right)$ and AdaVRAG uses $\mathcal{O}\left(n\log\log n+\sqrt{\frac{n\beta\log\beta}{\epsilon}}\right)$ gradient evaluations to attain an $\mathcal{O}(\epsilon)$suboptimal solution, where $n$ is the number of functions in the finite sum and $\beta$ is the smoothness parameter. This result matches the bestknown convergence rate of nonadaptive VR methods and it improves upon the convergence of the state of the art adaptive VR method, AdaSVRG. We demonstrate the superior performance of our algorithms compared with previous methods in experiments on realworld datasets.
more »
« less
 NSFPAR ID:
 10353901
 Date Published:
 Journal Name:
 International Conference on Machine Learning
 Page Range / eLocation ID:
 1394713994
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


Techniques for reducing the variance of gradient estimates used in stochastic programming algorithms for convex finitesum problems have received a great deal of attention in recent years. By leveraging dissipativity theory from control, we provide a new perspective on two important variancereduction algorithms: SVRG and its direct accelerated variant Katyusha. Our perspective provides a physically intuitive understanding of the behavior of SVRGlike methods via a principle of energy conservation. The tools discussed here allow us to automate the convergence analysis of SVRGlike methods by capturing their essential properties in small semidefinite programs amenable to standard analysis and computational techniques. Our approach recovers existing convergence results for SVRG and Katyusha and generalizes the theory to alternative parameter choices. We also discuss how our approach complements the linear coupling technique. Our combination of perspectives leads to a better understanding of accelerated variancereduced stochastic methods for finitesum problems.more » « less

We propose a stochastic variancereduced cubic regularized Newton method (SVRC) for nonconvex optimization. At the core of our algorithm is a novel semistochastic gradient along with a semistochastic Hessian, which are specifically designed for cubic regularization method. We show that our algorithm is guaranteed to converge to an $(\epsilon,\sqrt{\epsilon})$approximate local minimum within $\tilde{O}(n^{4/5}/\epsilon^{3/2})$ secondorder oracle calls, which outperforms the stateoftheart cubic regularization algorithms including subsampled cubic regularization. Our work also sheds light on the application of variance reduction technique to highorder nonconvex optimization methods. Thorough experiments on various nonconvex optimization problems support our theory.more » « less

We investigate the behavior of higherform symmetries at variousquantum phase transitions. We consider discrete 1form symmetries, whichcan be either part of the generalized concept “categorical symmetry”(labelled as \tilde{Z}_N^{(1)} Z ̃ N ( 1 ) )introduced recently, or an explicit Z_N^{(1)} Z N ( 1 ) 1form symmetry. We demonstrate that for many quantum phase transitionsinvolving a Z_N^{(1)} Z N ( 1 ) or \tilde{Z}_N^{(1)} Z ̃ N ( 1 ) symmetry, the following expectation value \langle \left( O_\mathcal{C}\right)^2 \rangle ⟨ ( O 𝒞 ) 2 ⟩ takes the form \langle \left( \log O_\mathcal{C} \right)^2 \rangle \sim  \frac{A}{\epsilon} P + b \log P ⟨ ( log O 𝒞 ) 2 ⟩ ∼ − A ϵ P + b log P , where O_\mathcal{C} O 𝒞 is an operator defined associated with loop \mathcal{C} 𝒞 (or its interior \mathcal{A} 𝒜 ),which reduces to the Wilson loop operator for cases with an explicit Z_N^{(1)} Z N ( 1 ) 1form symmetry. P P is the perimeter of \mathcal{C} 𝒞 ,and the b \log P b log P term arises from the sharp corners of the loop \mathcal{C} 𝒞 ,which is consistent with recent numerics on a particular example. b b is a universal microscopicindependent number, which in (2+1)d ( 2 + 1 ) d is related to the universal conductivity at the quantum phasetransition. b b can be computed exactly for certain transitions using the dualitiesbetween (2+1)d ( 2 + 1 ) d conformal field theories developed in recent years. We also compute the"strange correlator" of O_\mathcal{C} O 𝒞 : S_{\mathcal{C}} = \langle 0  O_\mathcal{C}  1 \rangle / \langle 0  1 \rangle S 𝒞 = ⟨ 0  O 𝒞  1 ⟩ / ⟨ 0  1 ⟩ where 0\rangle  0 ⟩ and 1\rangle  1 ⟩ are manybody states with different topological nature.more » « less

We consider the allocation problem in which $m \leq (1\epsilon) dn $ items are to be allocated to $n$ bins with capacity $d$. The items $x_1,x_2,\ldots,x_m$ arrive sequentially and when item $x_i$ arrives it is given two possible bin locations $p_i=h_1(x_i),q_i=h_2(x_i)$ via hash functions $h_1,h_2$. We consider a random walk procedure for inserting items and show that the expected time insertion time is constant provided $\epsilon = \Omega\left(\sqrt{ \frac{ \log d}{d}} \right).$more » « less