How to Guess a Gradient

Singhal, Utkarsh; Cheung, Brian; Chandra, Kartik; Ragan-Kelley, Jonathan; Tenenbaum, Joshua B; Poggio, Tomaso; Yu, Stella X

Citation Details

How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is “very little.” However, in this paper, we show that gradients are more structured than previously thought. Gradients lie in a predictable low-dimensional subspace which depends on the network architecture and incoming features. Exploiting this structure can significantly improve gradient-free optimization schemes based on directional derivatives, which have struggled to scale beyond small networks trained on toy datasets. We study how to narrow the gap in optimization performance between methods that calculate exact gradients and those that use directional derivatives. Furthermore, we highlight new challenges in overcoming the large gap between optimizing with exact gradients and guessing the gradients. more »

Award ID(s):: 2134108

PAR ID:: 10565442

Author(s) / Creator(s):: Singhal, Utkarsh; Cheung, Brian; Chandra, Kartik; Ragan-Kelley, Jonathan; Tenenbaum, Joshua B; Poggio, Tomaso; Yu, Stella X

Publisher / Repository:: arXivorg

Date Published:: 2023-12-07

Journal Name:: arXivorg

ISSN:: 2331-8422

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this