Parameterfree stochastic gradient descent (PFSGD) algorithms do not require setting learning rates while achieving optimal theoretical performance. In practical applications, however, there remains an empirical gap between tuned stochastic gradient descent (SGD) and PFSGD. In this paper, we close the empirical gap with a new parameterfree algorithm based on continuoustime CoinBetting on truncated models. The new update is derived through the solution of an Ordinary Differential Equation (ODE) and solved in a closed form. We show empirically that this new parameterfree algorithm outperforms algorithms with the "best default" learning rates and almost matches the performance of finely tuned baselines without anything to tune.
more »
« less
Implicit Parameterfree Online Learning with Truncated Linear Models
Parameterfree algorithms are online learning algorithms that do not require setting learning rates. They achieve optimal regret with respect to the distance between the initial point and any competitor. Yet, parameterfree algorithms do not take into account the geometry of the losses. Recently, in the stochastic optimization literature, it has been proposed to instead use truncated linear lower bounds, which produce better performance by more closely modeling the losses. In particular, truncated linear models greatly reduce the problem of overshooting the minimum of the loss function. Unfortunately, truncated linear models cannot be used with parameterfree algorithms because the updates become very expensive to compute. In this paper, we propose new parameterfree algorithms that can take advantage of truncated linear models through a new update that has an “implicit” flavor. Based on a novel decomposition of the regret, the new update is efficient, requires only one gradient at each step, never overshoots the minimum of the truncated model, and retains the favorable parameterfree properties. We also conduct an empirical study demonstrating the practical utility of our algorithms.
more »
« less
 NSFPAR ID:
 10389018
 Editor(s):
 Dasgupta, Sanjoy; Haghtalab, Nika
 Date Published:
 Journal Name:
 International Conference on Algorithmic Learning Theory
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We introduce several new blackbox reductions that significantly improve the design of adaptive and parameterfree online learning algorithms by simplifying analysis, improving regret guarantees, and sometimes even improving runtime. We reduce parameterfree online learning to online expconcave optimization, we reduce optimization in a Banach space to onedimensional optimization, and we reduce optimization over a constrained domain to unconstrained optimization. All of our reductions run as fast as online gradient descent. We use our new techniques to improve upon the previously best regret bounds for parameterfree learning, and do so for arbitrary norms.more » « less

Expander graphs play a central role in graph theory and algorithms. With a number of powerful algorithmic tools developed around them, such as the CutMatching game, expander pruning, expander decomposition, and algorithms for decremental AllPairs Shortest Paths (APSP) in expanders, to name just a few, the use of expanders in the design of graph algorithms has become ubiquitous. Specific applications of interest to us are fast deterministic algorithms for cut problems in static graphs, and algorithms for dynamic distancebased graph problems, such as APSP. Unfortunately, the use of expanders in these settings incurs a number of drawbacks. For example, the best currently known algorithm for decremental APSP in constantdegree expanders can only achieve a (log n) O(1/ 2 ) approximation with n 1+O( ) total update time for any . All currently known algorithms for the Cut Player in the CutMatching game are either randomized, or provide rather weak guarantees: expansion 1/(log n) 1/ with running time n 1+O( ) . This, in turn, leads to somewhat weak algorithmic guarantees for several central cut problems: the best current almost linear time deterministic algorithms for Sparsest Cut, Lowest Conductance Cut, and Balanced Cut can only achieve approximation factor (log n) ω(1). Lastly, when relying on expanders in distancebased problems, such as dynamic APSP, via current methods, it seems inevitable that one has to settle for approximation factors that are at least Ω(log n). In contrast, we do not have any negative results that rule out a factor5 approximation with nearlinear total update time. In this paper we propose the use of wellconnected graphs, and introduce a new algorithmic toolkit for such graphs that, in a sense, mirrors the above mentioned algorithmic tools for expanders. One of these new tools is the Distanced Matching game, an analogue of the CutMatching game for wellconnected graphs. We demonstrate the power of these new tools by obtaining better results for several of the problems mentioned above. First, we design an algorithm for decremental APSP in expanders with significantly better guarantees: in a constantdegree expander, the algorithm achieves (log n) 1+o(1)approximation, with total update time n 1+o(1). We also obtain a deterministic algorithm for the Cut Player in the CutMatching game that achieves expansion 1 (log n) 5+o(1) in time n 1+o(1), deterministic almost lineartime algorithms for Sparsest Cut, LowestConductance Cut, and Minimum Balanced Cut with approximation factors O(poly log n), as well as improved deterministic algorithm for Expander Decomposition. We believe that the use of wellconnected graphs instead of expanders in various dynamic distancebased problems (such as APSP in general graphs) has the potential of providing much stronger guarantees, since we are no longer necessarily restricted to superlogarithmic approximation factors.more » « less

null (Ed.)We consider the classical Minimum Balanced Cut problem: given a graph $G$, compute a partition of its vertices into two subsets of roughly equal volume, while minimizing the number of edges connecting the subsets. We present the first {\em deterministic, almostlinear time} approximation algorithm for this problem. Specifically, our algorithm, given an $n$vertex $m$edge graph $G$ and any parameter $1\leq r\leq O(\log n)$, computes a $(\log m)^{r^2}$approximation for Minimum Balanced Cut on $G$, in time $O\left ( m^{1+O(1/r)+o(1)}\cdot (\log m)^{O(r^2)}\right )$. In particular, we obtain a $(\log m)^{1/\epsilon}$approximation in time $m^{1+O(1/\sqrt{\epsilon})}$ for any constant $\epsilon$, and a $(\log m)^{f(m)}$approximation in time $m^{1+o(1)}$, for any slowly growing function $m$. We obtain deterministic algorithms with similar guarantees for the Sparsest Cut and the LowestConductance Cut problems. Our algorithm for the Minimum Balanced Cut problem in fact provides a stronger guarantee: it either returns a balanced cut whose value is close to a given target value, or it certifies that such a cut does not exist by exhibiting a large subgraph of $G$ that has high conductance. We use this algorithm to obtain deterministic algorithms for dynamic connectivity and minimum spanning forest, whose worstcase update time on an $n$vertex graph is $n^{o(1)}$, thus resolving a major open problem in the area of dynamic graph algorithms. Our work also implies deterministic algorithms for a host of additional problems, whose time complexities match, up to subpolynomial in $n$ factors, those of known randomized algorithms. The implications include almostlinear time deterministic algorithms for solving Laplacian systems and for approximating maximum flows in undirected graphs.more » « less

null (Ed.)Federated multiarmed bandits (FMAB) is a new bandit paradigm that parallels the federated learning (FL) framework in supervised learning. It is inspired by practical applications in cognitive radio and recommender systems, and enjoys features that are analogous to FL. This paper proposes a general framework of FMAB and then studies two specific federated bandit models. We first study the approximate model where the heterogeneous local models are random realizations of the global model from an unknown distribution. This model introduces a new uncertainty of client sampling, as the global model may not be reliably learned even if the finite local models are perfectly known. Furthermore, this uncertainty cannot be quantified a priori without knowledge of the suboptimality gap. We solve the approximate model by proposing Federated Double UCB (Fed2UCB), which constructs a novel “double UCB” principle accounting for uncertainties from both arm and client sampling. We show that gradually admitting new clients is critical in achieving an O(log(T)) regret while explicitly considering the communication loss. The exact model, where the global bandit model is the exact average of heterogeneous local models, is then studied as a special case. We show that, somewhat surprisingly, the orderoptimal regret can be achieved independent of the number of clients with a careful choice of the update periodicity. Experiments using both synthetic and realworld datasets corroborate the theoretical analysis and demonstrate the effectiveness and efficiency of the proposed algorithms.more » « less