skip to main content


Title: Efficient Convex Optimization Requires Superlinear Memory
We show that any memory-constrained, first-order algorithm which minimizes d-dimensional, 1-Lipschitz convex functions over the unit ball to 1/ poly(d) accuracy using at most d^(1.25-delta) bits of memory must make at least d^(1+ 4 delta / 3) first-order queries (for any constant delta in (0,1/4). Consequently, the performance of such memory-constrained algorithms are a polynomial factor worse than the optimal O(d polylog d) query bound for this problem obtained by cutting plane methods that use >d^2 memory. This resolves one of the open problems in the COLT 2019 open problem publication of Woodworth and Srebro.  more » « less
Award ID(s):
1813049 1704417
PAR ID:
10354702
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Conference on Learning Theory (COLT)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We show that any memory-constrained, first-order algorithm which minimizes d-dimensional, 1-Lipschitz convex functions over the unit ball to 1/poly(d) accuracy using at most $d^{1.25-\delta}$ bits of memory must make at least $\tilde{Omega}(d^{1+(4/3)\delta})$ first-order queries (for any constant $\delta in [0,1/4]$). Consequently, the performance of such memory-constrained algorithms are a polynomial factor worse than the optimal $\tilde{O}(d)$ query bound for this problem obtained by cutting plane methods that use $\tilde{O}(d^2)$ memory. This resolves a COLT 2019 open problem of Woodworth and Srebro. 
    more » « less
  2. We consider the problem of performing linear regression over a stream of d-dimensional examples, and show that any algorithm that uses a subquadratic amount of memory exhibits a slower rate of convergence than can be achieved without memory constraints. Specifically, consider a sequence of labeled examples (a_1,b_1), (a_2,b_2)..., with a_i drawn independently from a d-dimensional isotropic Gaussian, and where b_i = + \eta_i, for a fixed x in R^d with ||x||= 1 and with independent noise \eta_i drawn uniformly from the interval [-2^{-d/5},2^{-d/5}]. We show that any algorithm with at most d^2/4 bits of memory requires at least \Omega(d \log \log \frac{1}{\epsilon}) samples to approximate x to \ell_2 error \epsilon with probability of success at least 2/3, for \epsilon sufficiently small as a function of d. In contrast, for such \epsilon, x can be recovered to error \epsilon with probability 1-o(1) with memory O\left(d^2 \log(1/\epsilon)\right) using d examples. This represents the first nontrivial lower bounds for regression with super-linear memory, and may open the door for strong memory/sample tradeoffs for continuous optimization. 
    more » « less
  3. Abstract We suggest two related conjectures dealing with the existence of spanning irregular subgraphs of graphs. The first asserts that any $d$ -regular graph on $n$ vertices contains a spanning subgraph in which the number of vertices of each degree between $0$ and $d$ deviates from $\frac{n}{d+1}$ by at most $2$ . The second is that every graph on $n$ vertices with minimum degree $\delta$ contains a spanning subgraph in which the number of vertices of each degree does not exceed $\frac{n}{\delta +1}+2$ . Both conjectures remain open, but we prove several asymptotic relaxations for graphs with a large number of vertices $n$ . In particular we show that if $d^3 \log n \leq o(n)$ then every $d$ -regular graph with $n$ vertices contains a spanning subgraph in which the number of vertices of each degree between $0$ and $d$ is $(1+o(1))\frac{n}{d+1}$ . We also prove that any graph with $n$ vertices and minimum degree $\delta$ contains a spanning subgraph in which no degree is repeated more than $(1+o(1))\frac{n}{\delta +1}+2$ times. 
    more » « less
  4. Abstract

    A conjecture of Kalai asserts that for $d\geq 4$, the affine type of a prime simplicial $d$-polytope $P$ can be reconstructed from the space of affine $2$-stresses of $P$. We prove this conjecture for all $d\geq 5$. We also prove the following generalization: for all pairs $(i,d)$ with $2\leq i\leq \lceil \frac d 2\rceil -1$, the affine type of a simplicial $d$-polytope $P$ that has no missing faces of dimension $\geq d-i+1$ can be reconstructed from the space of affine $i$-stresses of $P$. A consequence of our proofs is a strengthening of the Generalized Lower Bound Theorem: it was proved by Nagel that for any simplicial $(d-1)$-sphere $\Delta $ and $1\leq k\leq \lceil \frac {d}{2}\rceil -1$, $g_{k}(\Delta )$ is at least as large as the number of missing $(d-k)$-faces of $\Delta $; here we show that, for $1\leq k\leq \lfloor \frac {d}{2}\rfloor -1$, equality holds if and only if $\Delta $ is $k$-stacked. Finally, we show that for $d\geq 4$, any simplicial $d$-polytope $P$ that has no missing faces of dimension $\geq d-1$ is redundantly rigid, that is, for each edge $e$ of $P$, there exists an affine $2$-stress on $P$ with a non-zero value on $e$.

     
    more » « less
  5. This paper focuses on showing time-message trade-offs in distributed algorithms for fundamental problems such as leader election, broadcast, spanning tree (ST), minimum spanning tree (MST), minimum cut, and many graph verification problems. We consider the synchronous CONGEST distributed computing model and assume that each node has initial knowledge of itself and the identifiers of its neighbors - the so-called KT_1 model - a well-studied model that also naturally arises in many applications. Recently, it has been established that one can obtain (almost) singularly optimal algorithms, i.e., algorithms that have simultaneously optimal time and message complexity (up to polylogarithmic factors), for many fundamental problems in the standard KT_0 model (where nodes have only local knowledge of themselves and not their neighbors). The situation is less clear in the KT_1 model. In this paper, we present several new distributed algorithms in the KT_1 model that trade off between time and message complexity. Our distributed algorithms are based on a uniform and general approach which involves constructing a sparsified spanning subgraph of the original graph - called a danner - that trades off the number of edges with the diameter of the sparsifier. In particular, a key ingredient of our approach is a distributed randomized algorithm that, given a graph G and any delta in [0,1], with high probability constructs a danner that has diameter O~(D + n^{1-delta}) and O~(min{m,n^{1+delta}}) edges in O~(n^{1-delta}) rounds while using O~(min{m,n^{1+delta}}) messages, where n, m, and D are the number of nodes, edges, and the diameter of G, respectively. Using our danner construction, we present a family of distributed randomized algorithms for various fundamental problems that exhibit a trade-off between message and time complexity and that improve over previous results. Specifically, we show the following results (all hold with high probability) in the KT_1 model, which subsume and improve over prior bounds in the KT_1 model (King et al., PODC 2014 and Awerbuch et al., JACM 1990) and the KT_0 model (Kutten et al., JACM 2015, Pandurangan et al., STOC 2017 and Elkin, PODC 2017): 1) Leader Election, Broadcast, and ST. These problems can be solved in O~(D+n^{1-delta}) rounds using O~(min{m,n^{1+delta}}) messages for any delta in [0,1]. 2) MST and Connectivity. These problems can be solved in O~(D+n^{1-delta}) rounds using O~(min{m,n^{1+delta}}) messages for any delta in [0,0.5]. In particular, for delta = 0.5 we obtain a distributed MST algorithm that runs in optimal O~(D+sqrt{n}) rounds and uses O~(min{m,n^{3/2}}) messages. We note that this improves over the singularly optimal algorithm in the KT_0 model that uses O~(D+sqrt{n}) rounds and O~(m) messages. 3) Minimum Cut. O(log n)-approximate minimum cut can be solved in O~(D+n^{1-delta}) rounds using O~(min{m,n^{1+delta}}) messages for any delta in [0,0.5]. 4) Graph Verification Problems such as Bipartiteness, Spanning Subgraph etc. These can be solved in O~(D+n^{1-delta}) rounds using O~(min{m,n^{1+delta}}) messages for any delta in [0,0.5]. 
    more » « less