skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Narrow One-Hidden-Layer ReLU Networks
We consider the well-studied problem of learning a linear combination of k ReLU activations with respect to a Gaussian distribution on inputs in d dimensions. We give the first polynomial-time algorithm that succeeds whenever k is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weight vectors being well-conditioned. Our approach is based on analyzing random contractions of higher-order moment tensors. We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together, sidestepping the conditioning issues present in prior work. This allows us to design an iterative procedure to discover individual neurons.  more » « less
Award ID(s):
1909204
PAR ID:
10478029
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
COLT 2023
Date Published:
Journal Name:
Conference on Learning Theory
Format(s):
Medium: X
Location:
Bangalore, India
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We give new and efficient black-box reconstruction algorithms for some classes of depth-3 arithmetic circuits. As a consequence, we obtain the first efficient algorithm for computing the tensor rank and for finding the optimal tensor decomposition as a sum of rank-one tensors when then input is a constant-rank tensor. More specifically, we provide efficient learning algorithms that run in randomized polynomial time over general fields and in deterministic polynomial time over and for the following classes: 1) Set-multilinear depth-3 circuits of constant top fan-in ((k) circuits). As a consequence of our algorithm, we obtain the first polynomial time algorithm for tensor rank computation and optimal tensor decomposition of constant-rank tensors. This result holds for d dimensional tensors for any d, but is interesting even for d=3. 2) Sums of powers of constantly many linear forms ((k) circuits). As a consequence we obtain the first polynomial-time algorithm for tensor rank computation and optimal tensor decomposition of constant-rank symmetric tensors. 3) Multilinear depth-3 circuits of constant top fan-in (multilinear (k) circuits). Our algorithm works over all fields of characteristic 0 or large enough characteristic. Prior to our work the only efficient algorithms known were over polynomially-sized finite fields (see. Karnin-Shpilka 09’). Prior to our work, the only polynomial-time or even subexponential-time algorithms known (deterministic or randomized) for subclasses of (k) circuits that also work over large/infinite fields were for the setting when the top fan-in k is at most 2 (see Sinha 16’ and Sinha 20’). 
    more » « less
  2. Megow, Nicole; Smith, Adam (Ed.)
    We prove that for some constant a > 1, for all k ≤ a, MATIME[n^{k(1+o(1))}]/1 ⊄ SIZE[O(n^k)], for some specific o(1) function. This is a super linear polynomial circuit lower bound. Previously, Santhanam [Santhanam, 2007] showed that there exists a constant c>1 such that for all k>1: MATIME[n^{ck}]/1 ⊄ SIZE[O(n^k)]. Inherently to Santhanam’s proof, c is a large constant and there is no upper bound on c. Using ideas from Murray and Williams [Murray and Williams, 2018], one can show for all k>1: MATIME [n^{10 k²}]/1 ⊄ SIZE[O(n^k)]. To prove this result, we construct the first PCP for SPACE[n] with quasi-linear verifier time: our PCP has a Õ(n) time verifier, Õ(n) space prover, O(log(n)) queries, and polynomial alphabet size. Prior to this work, PCPs for SPACE[O(n)] had verifiers that run in Ω(n²) time. This PCP also proves that NE has MIP verifiers which run in time Õ(n). 
    more » « less
  3. Given a length n sample from R^d and a neural network with a fixed architecture with W weights, k neurons, linear threshold activation functions, and binary outputs on each neuron, we study the problem of uniformly sampling from all possible labelings on the sample corresponding to different choices of weights. We provide an algorithm that runs in time polynomial both in n and W such that any labeling appears with probability at least (W2ekn)^W for W < n. For a single neuron, we also provide a random walk based algorithm that samples exactly uniformly. 
    more » « less
  4. Multiple known algorithmic paradigms (backtracking, local search and the polynomial method) only yield a 2n(1−1/O(k)) time algorithm for k-SAT in the worst case. For this reason, it has been hypothesized that the worst-case k-SAT problem cannot be solved in 2n(1−f(k)/k) time for any unbounded function f. This hypothesis has been called the “Super-Strong ETH”, modeled after the ETH and the Strong ETH. We give two results on the Super-Strong ETH: 1. It has also been hypothesized that k-SAT is hard to solve for randomly chosen instances near the “critical threshold”, where the clause-to-variable ratio is 2^kln2−Θ(1). We give a randomized algorithm which refutes the Super-Strong ETH for the case of random k-SAT and planted k-SAT for any clause-to-variable ratio. For example, given any random k-SAT instance F with n variables and m clauses, our algorithm decides satisfiability for F in 2^n(1−Ω(logk)/k) time, with high probability (over the choice of the formula and the randomness of the algorithm). It turns out that a well-known algorithm from the literature on SAT algorithms does the job: the PPZ algorithm of Paturi, Pudlák and Zane [17]. 2. The Unique k-SAT problem is the special case where there is at most one satisfying assignment. Improving prior reductions, we show that the Super-Strong ETHs for Unique k-SAT and k-SAT are equivalent. More precisely, we show the time complexities of Unique k-SAT and k-SAT are very tightly correlated: if Unique k-SAT is in 2^n(1−f(k)/k) time for an unbounded f, then k-SAT is in 2^n(1−f(k)(1−ε)/k) time for every ε>0. 
    more » « less
  5. null (Ed.)
    Finding locally optimal solutions for MAX-CUT and MAX- k -CUT are well-known PLS-complete problems. An instinctive approach to finding such a locally optimum solution is the FLIP method. Even though FLIP requires exponential time in worst-case instances, it tends to terminate quickly in practical instances. To explain this discrepancy, the run-time of FLIP has been studied in the smoothed complexity framework. Etscheid and Röglin (ACM Transactions on Algorithms, 2017) showed that the smoothed complexity of FLIP for max-cut in arbitrary graphs is quasi-polynomial. Angel, Bubeck, Peres, and Wei (STOC, 2017) showed that the smoothed complexity of FLIP for max-cut in complete graphs is ( O Φ 5 n 15.1 ), where Φ is an upper bound on the random edge-weight density and Φ is the number of vertices in the input graph. While Angel, Bubeck, Peres, and Wei’s result showed the first polynomial smoothed complexity, they also conjectured that their run-time bound is far from optimal. In this work, we make substantial progress toward improving the run-time bound. We prove that the smoothed complexity of FLIP for max-cut in complete graphs is O (Φ n 7.83 ). Our results are based on a carefully chosen matrix whose rank captures the run-time of the method along with improved rank bounds for this matrix and an improved union bound based on this matrix. In addition, our techniques provide a general framework for analyzing FLIP in the smoothed framework. We illustrate this general framework by showing that the smoothed complexity of FLIP for MAX-3-CUT in complete graphs is polynomial and for MAX - k - CUT in arbitrary graphs is quasi-polynomial. We believe that our techniques should also be of interest toward showing smoothed polynomial complexity of FLIP for MAX - k - CUT in complete graphs for larger constants k . 
    more » « less