We prove several hardness results for training depth2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NPhard already for a network with a single ReLU. We also prove NPhardness for outputting a weighted sum of k ReLUs minimizing the squared error (for k>1) even in the realizable setting (i.e., when the labels are consistent with an unknown depth2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error ϵ. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (GapETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest kSubgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in (1/epsilon)^2. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth2 network of ReLUs with bounded weights giving new (worstcase) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on epsilon.
more »
« less
LowDepth Arithmetic Circuit Lower Bounds: Bypassing SetMultilinearization
A recent breakthrough work of Limaye, Srinivasan and Tavenas [Nutan Limaye et al., 2021] proved superpolynomial lower bounds for lowdepth arithmetic circuits via a "hardness escalation" approach: they proved lower bounds for lowdepth setmultilinear circuits and then lifted the bounds to lowdepth general circuits. In this work, we prove superpolynomial lower bounds for lowdepth circuits by bypassing the hardness escalation, i.e., the setmultilinearization, step. As setmultilinearization comes with an exponential blowup in circuit size, our direct proof opens up the possibility of proving an exponential lower bound for lowdepth homogeneous circuits by evading a crucial bottleneck. Our bounds hold for the iterated matrix multiplication and the NisanWigderson design polynomials. We also define a subclass of unrestricted depth homogeneous formulas which we call unique parse tree (UPT) formulas, and prove superpolynomial lower bounds for these. This significantly generalizes the superpolynomial lower bounds for regular formulas [Neeraj Kayal et al., 2014; Hervé Fournier et al., 2015].
more »
« less
 Award ID(s):
 2152413
 NSFPAR ID:
 10484523
 Editor(s):
 Etessami, Kousha; Feige, Uriel; Puppis, Gabriele
 Publisher / Repository:
 Schloss Dagstuhl – LeibnizZentrum für Informatik
 Date Published:
 Journal Name:
 50th International Colloquium on Automata, Languages, and Programming
 Subject(s) / Keyword(s):
 arithmetic circuits lowdepth circuits lower bounds shifted partials Theory of computation → Algebraic complexity theory
 Format(s):
 Medium: X
 Location:
 Paderborn, Germany
 Sponsoring Org:
 National Science Foundation
More Like this


We give superpolynomial statistical query (SQ) lower bounds for learning twohiddenlayer ReLU networks with respect to Gaussian inputs in the standard (noisefree) model. No general SQ lower bounds were known for learning ReLU networks of any depth in this setting: previous SQ lower bounds held only for adversarial noise models (agnostic learning) or restricted models such as correlational SQ. Prior work hinted at the impossibility of our result: Vempala and Wilmes showed that general SQ lower bounds cannot apply to any realvalued family of functions that satisfies a simple nondegeneracy condition. To circumvent their result, we refine a lifting procedure due to Daniely and Vardi that reduces Boolean PAC learning problems to Gaussian ones. We show how to extend their technique to other learning models and, in many wellstudied cases, obtain a more efficient reduction. As such, we also prove new cryptographic hardness results for PAC learning twohiddenlayer ReLU networks, as well as new lower bounds for learning constantdepth ReLU networks from label queries.more » « less

The classical BeardwoodHaltonHammersly theorem (1959) asserts the existence of an asymptotic formula of the form $\beta \sqrt n$ for the minimum length of a Traveling Salesperson Tour throuh $n$ random points in the unit square, and in the decades since it was proved, the existence of such formulas has been shown for other such \emph{Euclidean functionals} on random points in the unit square as well. Despite more than 50 years of attention, however, it remained unknown whether the minimum length TSP through $n$ random points in $[0,1]^2$ was asymptotically distinct from its natural lower bounds, such as the minimum length spanning tree, the minimum length 2factor, or, as raised by Goemans and Bertsimas, from its linear programming relaxation. We prove that the TSP on random points in Euclidean space is indeed asymptotically distinct from these and other natural lower bounds, and show that this separation implies that branchandbound algorithms based on these natural lower bounds must take nearly exponential ($e^{\tilde \Omega(n)}$) time to solve the TSP to optimality, even in average case. This is the first averagecase superpolynomial lower bound for these branchandbound algorithms (a lower bound as strong as $e^{\tilde \Omega (n)}$ was not even been known in worstcase analysis).more » « less

We study the properties of output distributions of noisy random circuits. We obtain upper and lower bounds on the expected distance of the output distribution from the “useless” uniform distribution. These bounds are tight with respect to the dependence on circuit depth. Our proof techniques also allow us to make statements about the presence or absence of anticoncentration for both noisy and noiseless circuits. We uncover a number of interesting consequences for hardness proofs of sampling schemes that aim to show a quantum computational advantage over classical computation. Specifically, we discuss recent barrier results for depthagnostic and/or noiseagnostic proof techniques. We show that in certain depth regimes, noiseagnostic proof techniques might still work in order to prove an oftenconjectured claim in the literature on quantum computational advantage, contrary to what has been thought prior to this work.more » « less

One powerful theme in complexity theory and pseudorandomness in the past few decades has been the use of lower bounds to give pseudorandom generators (PRGs). However, the general results using this hardness vs. randomness paradigm suffer from a quantitative loss in parameters, and hence do not give nontrivial implications for models where we don’t know superpolynomial lower bounds but do know lower bounds of a fixed polynomial. We show that when such lower bounds are proved using random restrictions, we can construct PRGs which are essentially best possible without in turn improving the lower bounds. More specifically, say that a circuit family has shrinkage exponent Γ if a random restriction leaving a p fraction of variables unset shrinks the size of any circuit in the family by a factor of p Γ + o (1) . Our PRG uses a seed of length s 1/(Γ + 1) + o (1) to fool circuits in the family of size s . By using this generic construction, we get PRGs with polynomially small error for the following classes of circuits of size s and with the following seed lengths: (1) For de Morgan formulas, seed length s 1/3+ o (1) ; (2) For formulas over an arbitrary basis, seed length s 1/2+ o (1) ; (3) For readonce de Morgan formulas, seed length s .234... ; (4) For branching programs of size s , seed length s 1/2+ o (1) . The previous best PRGs known for these classes used seeds of length bigger than n /2 to output n bits, and worked only for size s = O ( n ) [8].more » « less