Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

We prove several hardness results for training depth2 neural networks with the ReLU activation function; these networks are simply weighted sums (that may include negative coefficients) of ReLUs. Our goal is to output a depth2 neural network that minimizes the square loss with respect to a given training set. We prove that this problem is NPhard already for a network with a single ReLU. We also prove NPhardness for outputting a weighted sum of k ReLUs minimizing the squared error (for k>1) even in the realizable setting (i.e., when the labels are consistent with an unknown depth2 ReLU network). We are also able to obtain lower bounds on the running time in terms of the desired additive error ϵ. To obtain our lower bounds, we use the Gap Exponential Time Hypothesis (GapETH) as well as a new hypothesis regarding the hardness of approximating the well known Densest kSubgraph problem in subexponential time (these hypotheses are used separately in proving different lower bounds). For example, we prove that under reasonable hardness assumptions, any proper learning algorithm for finding the best fitting ReLU must run in time exponential in (1/epsilon)^2. Together with a previous work regarding improperly learning a ReLU (Goel et al., COLT'17), this implies the first separation between proper and improper algorithms for learning a ReLU. We also study the problem of properly learning a depth2 network of ReLUs with bounded weights giving new (worstcase) upper bounds on the running time needed to learn such networks both in the realizable and agnostic settings. Our upper bounds on the running time essentially matches our lower bounds in terms of the dependency on epsilon.more » « less

We consider computational games, sequences of games G = (G1,G2,...) where, for all n, Gn has the same set of players. Computational games arise in electronic money systems such as Bitcoin, in cryptographic protocols, and in the study of generative adversarial networks in machine learning. Assuming that oneway functions exist, we prove that there is 2player zerosum computational game G such that, for all n, the size of the action space in Gn is polynomial in n and the utility function in Gn is computable in time polynomial in n, and yet there is no εNash equilibrium if players are restricted to using strategies computable by polynomialtime Turing machines, where we use a notion of Nash equilibrium that is tailored to computational games. We also show that an εNash equilibrium may not exist if players are constrained to perform at most T computational steps in each of the games in the sequence. On the other hand, we show that if players can use arbitrary Turing machines to compute their strategies, then every computational game has an εNash equilibrium. These results may shed light on competitive settings where the availability of more running time or faster algorithms can lead to a “computational arms race”, precluding the existence of equilibrium. They also point to inherent limitations of concepts such as “best response” and Nash equilibrium in games with resourcebounded players.more » « less

Predicting and understanding how people make decisions has been a longstanding goal in many fields, with quantitative models of human decisionmaking informing research in both the social sciences and engineering. We show how progress toward this goal can be accelerated by using large datasets to power machinelearning algorithms that are constrained to produce interpretable psychological theories. Conducting the largest experiment on risky choice to date and analyzing the results using gradientbased optimization of differentiable decision theories implemented through artificial neural networks, we were able to recapitulate historical discoveries, establish that there is room to improve on existing theories, and discover a new, more accurate model of human decisionmaking in a form that preserves the insights from centuries of research.