Time/Accuracy Tradeoffs for learning a ReLU with respect to Gaussian Marginals

Goel, Surbhi; Karmalkar, Sushrut; Klivans, Adam

Citation Details

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let 𝗈𝗉𝗍<1 be the population loss of the best-fitting ReLU. We prove: 1. Finding a ReLU with square-loss 𝗈𝗉𝗍+ϵ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply -{\emph unconditionally}- that gradient descent cannot converge to the global minimum in polynomial time. 2. There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error O(𝗈𝗉𝗍^{2/3}). The algorithm uses a novel reduction to noisy halfspace learning with respect to 0/1 loss. Prior work due to Soltanolkotabi [Sol17] showed that gradient descent can find the best-fitting ReLU with respect to Gaussian marginals, if the training set is exactly labeled by a ReLU. more »

Award ID(s):: 1717896

PAR ID:: 10190451

Author(s) / Creator(s):: Goel, Surbhi; Karmalkar, Sushrut; Klivans, Adam

Date Published:: 2019-01-01

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Page Range / eLocation ID:: 8582-8591

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this