skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Narrow One-Hidden-Layer ReLU Networks
We consider the well-studied problem of learning a linear combination of k ReLU activations with respect to a Gaussian distribution on inputs in d dimensions. We give the first polynomial-time algorithm that succeeds whenever k is a constant. All prior polynomial-time learners require additional assumptions on the network, such as positive combining coefficients or the matrix of hidden weight vectors being well-conditioned. Our approach is based on analyzing random contractions of higher-order moment tensors. We use a multi-scale analysis to argue that sufficiently close neurons can be collapsed together, sidestepping the conditioning issues present in prior work. This allows us to design an iterative procedure to discover individual neurons.  more » « less
Award ID(s):
1909204
PAR ID:
10478029
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
COLT 2023
Date Published:
Journal Name:
Conference on Learning Theory
Format(s):
Medium: X
Location:
Bangalore, India
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We give new and efficient black-box reconstruction algorithms for some classes of depth-3 arithmetic circuits. As a consequence, we obtain the first efficient algorithm for computing the tensor rank and for finding the optimal tensor decomposition as a sum of rank-one tensors when then input is a constant-rank tensor. More specifically, we provide efficient learning algorithms that run in randomized polynomial time over general fields and in deterministic polynomial time over and for the following classes: 1) Set-multilinear depth-3 circuits of constant top fan-in ((k) circuits). As a consequence of our algorithm, we obtain the first polynomial time algorithm for tensor rank computation and optimal tensor decomposition of constant-rank tensors. This result holds for d dimensional tensors for any d, but is interesting even for d=3. 2) Sums of powers of constantly many linear forms ((k) circuits). As a consequence we obtain the first polynomial-time algorithm for tensor rank computation and optimal tensor decomposition of constant-rank symmetric tensors. 3) Multilinear depth-3 circuits of constant top fan-in (multilinear (k) circuits). Our algorithm works over all fields of characteristic 0 or large enough characteristic. Prior to our work the only efficient algorithms known were over polynomially-sized finite fields (see. Karnin-Shpilka 09’). Prior to our work, the only polynomial-time or even subexponential-time algorithms known (deterministic or randomized) for subclasses of (k) circuits that also work over large/infinite fields were for the setting when the top fan-in k is at most 2 (see Sinha 16’ and Sinha 20’). 
    more » « less
  2. Reservoir computing advances the intriguing idea that a nonlinear recurrent neural circuit—the reservoir—can encode spatio-temporal input signals to enable efficient ways to perform tasks like classification or regression. However, recently the idea of a monolithic reservoir network that simultaneously buffers input signals and expands them into nonlinear features has been challenged. A representation scheme in which memory buffer and expansion into higher-order polynomial features can be configured separately has been shown to significantly outperform traditional reservoir computing in prediction of multivariate time-series. Here we propose a configurable neuromorphic representation scheme that provides competitive performance on prediction, but with significantly better scaling properties than directly materializing higher-order features as in prior work. Our approach combines the use of randomized representations from traditional reservoir computing with mathematical principles for approximating polynomial kernels via such representations. While the memory buffer can be realized with standard reservoir networks, computing higher-order features requires networks of ‘Sigma-Pi’ neurons, i.e., neurons that enable both summation as well as multiplication of inputs. Finally, we provide an implementation of the memory buffer and Sigma-Pi networks on Loihi 2, an existing neuromorphic hardware platform. 
    more » « less
  3. Megow, Nicole; Smith, Adam (Ed.)
    We prove that for some constant a > 1, for all k ≤ a, MATIME[n^{k(1+o(1))}]/1 ⊄ SIZE[O(n^k)], for some specific o(1) function. This is a super linear polynomial circuit lower bound. Previously, Santhanam [Santhanam, 2007] showed that there exists a constant c>1 such that for all k>1: MATIME[n^{ck}]/1 ⊄ SIZE[O(n^k)]. Inherently to Santhanam’s proof, c is a large constant and there is no upper bound on c. Using ideas from Murray and Williams [Murray and Williams, 2018], one can show for all k>1: MATIME [n^{10 k²}]/1 ⊄ SIZE[O(n^k)]. To prove this result, we construct the first PCP for SPACE[n] with quasi-linear verifier time: our PCP has a Õ(n) time verifier, Õ(n) space prover, O(log(n)) queries, and polynomial alphabet size. Prior to this work, PCPs for SPACE[O(n)] had verifiers that run in Ω(n²) time. This PCP also proves that NE has MIP verifiers which run in time Õ(n). 
    more » « less
  4. Given a length n sample from R^d and a neural network with a fixed architecture with W weights, k neurons, linear threshold activation functions, and binary outputs on each neuron, we study the problem of uniformly sampling from all possible labelings on the sample corresponding to different choices of weights. We provide an algorithm that runs in time polynomial both in n and W such that any labeling appears with probability at least (W2ekn)^W for W < n. For a single neuron, we also provide a random walk based algorithm that samples exactly uniformly. 
    more » « less
  5. Multiple known algorithmic paradigms (backtracking, local search and the polynomial method) only yield a 2n(1−1/O(k)) time algorithm for k-SAT in the worst case. For this reason, it has been hypothesized that the worst-case k-SAT problem cannot be solved in 2n(1−f(k)/k) time for any unbounded function f. This hypothesis has been called the “Super-Strong ETH”, modeled after the ETH and the Strong ETH. We give two results on the Super-Strong ETH: 1. It has also been hypothesized that k-SAT is hard to solve for randomly chosen instances near the “critical threshold”, where the clause-to-variable ratio is 2^kln2−Θ(1). We give a randomized algorithm which refutes the Super-Strong ETH for the case of random k-SAT and planted k-SAT for any clause-to-variable ratio. For example, given any random k-SAT instance F with n variables and m clauses, our algorithm decides satisfiability for F in 2^n(1−Ω(logk)/k) time, with high probability (over the choice of the formula and the randomness of the algorithm). It turns out that a well-known algorithm from the literature on SAT algorithms does the job: the PPZ algorithm of Paturi, Pudlák and Zane [17]. 2. The Unique k-SAT problem is the special case where there is at most one satisfying assignment. Improving prior reductions, we show that the Super-Strong ETHs for Unique k-SAT and k-SAT are equivalent. More precisely, we show the time complexities of Unique k-SAT and k-SAT are very tightly correlated: if Unique k-SAT is in 2^n(1−f(k)/k) time for an unbounded f, then k-SAT is in 2^n(1−f(k)(1−ε)/k) time for every ε>0. 
    more » « less