skip to main content

Title: Improved Convergence for $\ell_\infty$ and $\ell_1$ Regression via Iteratively Reweighted Least Squares
The iteratively reweighted least squares method (IRLS) is a popular technique used in practice for solving regression problems. Various versions of this method have been proposed, but their theoretical analyses failed to capture the good practical performance. In this paper we propose a simple and natural version of IRLS for solving $\ell_\infty$ and $\ell_1$ regression, which provably converges to a $(1+\epsilon)$-approximate solution in $O(m^{1/3}\log(1/\epsilon)/\epsilon^{2/3} + \log m/\epsilon^2)$ iterations, where $m$ is the number of rows of the input matrix. Interestingly, this running time is independent of the conditioning of the input, and the dominant term of the running time depends sublinearly in $\epsilon^{-1}$, which is atypical for the optimization of non-smooth functions. This improves upon the more complex algorithms of Chin et al. (ITCS '12), and Christiano et al. (STOC '11) by a factor of at least $1/\epsilon^2$, and yields a truly efficient natural algorithm for the slime mold dynamics (Straszak-Vishnoi, SODA '16, ITCS '16, ITCS '17).  more » « less
Award ID(s):
1718342 1750333
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We give two new quantum algorithms for solving semidefinite programs (SDPs) providing quantum speed-ups. We consider SDP instances with m constraint matrices, each of dimension n, rank at most r, and sparsity s. The first algorithm assumes an input model where one is given access to an oracle to the entries of the matrices at unit cost. We show that it has run time O~(s^2 (sqrt{m} epsilon^{-10} + sqrt{n} epsilon^{-12})), with epsilon the error of the solution. This gives an optimal dependence in terms of m, n and quadratic improvement over previous quantum algorithms (when m ~~ n). The second algorithm assumes a fully quantum input model in which the input matrices are given as quantum states. We show that its run time is O~(sqrt{m}+poly(r))*poly(log m,log n,B,epsilon^{-1}), with B an upper bound on the trace-norm of all input matrices. In particular the complexity depends only polylogarithmically in n and polynomially in r. We apply the second SDP solver to learn a good description of a quantum state with respect to a set of measurements: Given m measurements and a supply of copies of an unknown state rho with rank at most r, we show we can find in time sqrt{m}*poly(log m,log n,r,epsilon^{-1}) a description of the state as a quantum circuit preparing a density matrix which has the same expectation values as rho on the m measurements, up to error epsilon. The density matrix obtained is an approximation to the maximum entropy state consistent with the measurement data considered in Jaynes' principle from statistical mechanics. As in previous work, we obtain our algorithm by "quantizing" classical SDP solvers based on the matrix multiplicative weight update method. One of our main technical contributions is a quantum Gibbs state sampler for low-rank Hamiltonians, given quantum states encoding these Hamiltonians, with a poly-logarithmic dependence on its dimension, which is based on ideas developed in quantum principal component analysis. We also develop a "fast" quantum OR lemma with a quadratic improvement in gate complexity over the construction of Harrow et al. [Harrow et al., 2017]. We believe both techniques might be of independent interest. 
    more » « less
  2. Abstract

    In a Merlin–Arthur proof system, the proof verifier (Arthur) accepts valid proofs (from Merlin) with probability 1, and rejects invalid proofs with probability arbitrarily close to 1. The running time of such a system is defined to be the length of Merlin’s proof plus the running time of Arthur. We provide new Merlin–Arthur proof systems for some key problems in fine-grained complexity. In several cases our proof systems have optimal running time. Our main results include:

    Certifying that a list ofnintegers has no 3-SUM solution can be done in Merlin–Arthur time$$\tilde{O}(n)$$O~(n). Previously, Carmosino et al. [ITCS 2016] showed that the problem has a nondeterministic algorithm running in$$\tilde{O}(n^{1.5})$$O~(n1.5)time (that is, there is a proof system with proofs of length$$\tilde{O}(n^{1.5})$$O~(n1.5)and a deterministic verifier running in$$\tilde{O}(n^{1.5})$$O~(n1.5)time).

    Counting the number ofk-cliques with total edge weight equal to zero in ann-node graph can be done in Merlin–Arthur time$${\tilde{O}}(n^{\lceil k/2\rceil })$$O~(nk/2)(where$$k\ge 3$$k3). For oddk, this bound can be further improved for sparse graphs: for example, counting the number of zero-weight triangles in anm-edge graph can be done in Merlin–Arthur time$${\tilde{O}}(m)$$O~(m). Previous Merlin–Arthur protocols by Williams [CCC’16] and Björklund and Kaski [PODC’16] could only countk-cliques in unweighted graphs, and had worse running times for smallk.

    Computing the All-Pairs Shortest Distances matrix for ann-node graph can be done in Merlin–Arthur time$$\tilde{O}(n^2)$$O~(n2). Note this is optimal, as the matrix can have$$\Omega (n^2)$$Ω(n2)nonzero entries in general. Previously, Carmosino et al. [ITCS 2016] showed that this problem has an$$\tilde{O}(n^{2.94})$$O~(n2.94)nondeterministic time algorithm.

    Certifying that ann-variablek-CNF is unsatisfiable can be done in Merlin–Arthur time$$2^{n/2 - n/O(k)}$$2n/2-n/O(k). We also observe an algebrization barrier for the previous$$2^{n/2}\cdot \textrm{poly}(n)$$2n/2·poly(n)-time Merlin–Arthur protocol of R. Williams [CCC’16] for$$\#$$#SAT: in particular, his protocol algebrizes, and we observe there is no algebrizing protocol fork-UNSAT running in$$2^{n/2}/n^{\omega (1)}$$2n/2/nω(1)time. Therefore we have to exploit non-algebrizing properties to obtain our new protocol.

    Certifying a Quantified Boolean Formula is true can be done in Merlin–Arthur time$$2^{4n/5}\cdot \textrm{poly}(n)$$24n/5·poly(n). Previously, the only nontrivial result known along these lines was an Arthur–Merlin–Arthur protocol (where Merlin’s proof depends on some of Arthur’s coins) running in$$2^{2n/3}\cdot \textrm{poly}(n)$$22n/3·poly(n)time.

    Due to the centrality of these problems in fine-grained complexity, our results have consequences for many other problems of interest. For example, our work implies that certifying there is no Subset Sum solution tonintegers can be done in Merlin–Arthur time$$2^{n/3}\cdot \textrm{poly}(n)$$2n/3·poly(n), improving on the previous best protocol by Nederlof [IPL 2017] which took$$2^{0.49991n}\cdot \textrm{poly}(n)$$20.49991n·poly(n)time.

    more » « less
  3. We consider the problem of maximizing the multilinear extension of a submodular function subject a single matroid constraint or multiple packing constraints with a small number of adaptive rounds of evaluation queries. We obtain the first algorithms with low adaptivity for submodular maximization with a matroid constraint. Our algorithms achieve a $1-1/e-\epsilon$ approximation for monotone functions and a $1/e-\epsilon$ approximation for non-monotone functions, which nearly matches the best guarantees known in the fully adaptive setting. The number of rounds of adaptivity is $O(\log^2{n}/\epsilon^3)$, which is an exponential speedup over the existing algorithms. We obtain the first parallel algorithm for non-monotone submodular maximization subject to packing constraints. Our algorithm achieves a $1/e-\epsilon$ approximation using $O(\log(n/\epsilon) \log(1/\epsilon) \log(n+m)/ \epsilon^2)$ parallel rounds, which is again an exponential speedup in parallel time over the existing algorithms. For monotone functions, we obtain a $1-1/e-\epsilon$ approximation in $O(\log(n/\epsilon)\log(m)/\epsilon^2)$ parallel rounds. The number of parallel rounds of our algorithm matches that of the state of the art algorithm for solving packing LPs with a linear objective (Mahoney et al., 2016). Our results apply more generally to the problem of maximizing a diminishing returns submodular (DR-submodular) function. 
    more » « less
  4. Memory-hard functions (MHFs) are a key cryptographic primitive underlying the design of moderately expensive password hashing algorithms and egalitarian proofs of work. Over the past few years several increasingly stringent goals for an MHF have been proposed including the requirement that the MHF have high sequential space-time (ST) complexity, parallel space-time complexity, amortized area-time (aAT) complexity and sustained space complexity. Data-Independent Memory Hard Functions (iMHFs) are of special interest in the context of password hashing as they naturally resist side-channel attacks. iMHFs can be specified using a directed acyclic graph (DAG) $G$ with $N=2^n$ nodes and low indegree and the complexity of the iMHF can be analyzed using a pebbling game. Recently, Alwen et al. [CCS'17] constructed an DAG called DRSample which has aAT complexity at least $\Omega\left( N^2/\log N\right)$. Asymptotically DRSample outperformed all prior iMHF constructions including Argon2i, winner of the password hashing competition (aAT cost $\mathcal{O}\left(N^{1.767}\right)$), though the constants in these bounds are poorly understood. We show that the the greedy pebbling strategy of Boneh et al. [ASIACRYPT'16] is particularly effective against DRSample e.g., the aAT cost is $\mathcal{O}\left( N^2/\log N\right)$. In fact, our empirical analysis {\em reverses} the prior conclusion of Alwen et al. that DRSample provides stronger resistance to known pebbling attacks for practical values of $N \leq 2^{24}$. We construct a new iMHF candidate (DRSample+BRG) by using the bit-reversal graph to extend DRSample. We then prove that the construction is asymptotically optimal under every MHF criteria, and we empirically demonstrate that our iMHF provides the best resistance to {\em known} pebbling attacks. For example, we show that any parallel pebbling attack either has aAT cost $\omega(N^2)$ or requires at least $\Omega(N)$ steps with $\Omega(N/\log N)$ pebbles on the DAG. This makes our construction the first practical iMHF with a strong sustained space-complexity guarantee and immediately implies that any parallel pebbling has aAT complexity $\Omega(N^2/\log N)$. We also prove that any sequential pebbling (including the greedy pebbling attack) has aAT cost $\Omega\left( N^2\right)$ and, if a plausible conjecture holds, any parallel pebbling has aAT cost $\Omega(N^2 \log \log N/\log N)$ --- the best possible bound for an iMHF. We implement our new iMHF and demonstrate that it is just as fast as Argon2. Along the way we propose a simple modification to the Argon2 round function which increases an attacker's aAT cost by nearly an order of magnitude without increasing running time on a CPU. Finally, we give a pebbling reduction which proves that in the parallel random oracle model (PROM) the cost of evaluating an iMHF like Argon2i or DRSample+BRG is given by the pebbling cost of the underlying DAG. Prior pebbling reductions assumed that the iMHF round function concatenates input labels before hashing and did not apply to practical iMHFs such as Argon2i, DRSample or DRSample+BRG where input labels are instead XORed together. 
    more » « less
  5. We study local symmetry breaking problems in the Congest model, focusing on ruling set problems, which generalize the fundamental Maximal Independent Set (MIS) problem. The time (round) complexity of MIS (and ruling sets) have attracted much attention in the Local model. Indeed, recent results (Barenboim et al., FOCS 2012, Ghaffari SODA 2016) for the MIS problem have tried to break the long-standing O(log n)-round "barrier" achieved by Luby's algorithm, but these yield o(log n)-round complexity only when the maximum degree Delta is somewhat small relative to n. More importantly, these results apply only in the Local model. In fact, the best known time bound in the Congest model is still O(log n) (via Luby's algorithm) even for moderately small Delta (i.e., for Delta = Omega(log n) and Delta = o(n)). Furthermore, message complexity has been largely ignored in the context of local symmetry breaking. Luby's algorithm takes O(m) messages on m-edge graphs and this is the best known bound with respect to messages. Our work is motivated by the following central question: can we break the Theta(log n) time complexity barrier and the Theta(m) message complexity barrier in the Congest model for MIS or closely-related symmetry breaking problems? This paper presents progress towards this question for the distributed ruling set problem in the Congest model. A beta-ruling set is an independent set such that every node in the graph is at most beta hops from a node in the independent set. We present the following results: - Time Complexity: We show that we can break the O(log n) "barrier" for 2- and 3-ruling sets. We compute 3-ruling sets in O(log n/log log n) rounds with high probability (whp). More generally we show that 2-ruling sets can be computed in O(log Delta (log n)^(1/2 + epsilon) + log n/log log n) rounds for any epsilon > 0, which is o(log n) for a wide range of Delta values (e.g., Delta = 2^(log n)^(1/2-epsilon)). These are the first 2- and 3-ruling set algorithms to improve over the O(log n)-round complexity of Luby's algorithm in the Congest model. - Message Complexity: We show an Omega(n^2) lower bound on the message complexity of computing an MIS (i.e., 1-ruling set) which holds also for randomized algorithms and present a contrast to this by showing a randomized algorithm for 2-ruling sets that, whp, uses only O(n log^2 n) messages and runs in O(Delta log n) rounds. This is the first message-efficient algorithm known for ruling sets, which has message complexity nearly linear in n (which is optimal up to a polylogarithmic factor). 
    more » « less