How Deep Sparse Networks Avoid the Curse of Dimensionality: Efficiently Computable Functions are Compositionally Sparse

Poggio, Tomaso

Citation Details

The main claim of this perspective is that compositional sparsity of the target function, which corresponds to the task to be learned, is the key principle underlying machine learning. Mhaskar and Poggio (2016) proved that sparsity of the compositional target functions (when the functions are on the reals, the constituent functions are also required to be smooth), naturally leads to sparse deep networks for approximation and thus for optimization. This is the case of most CNNs in current use, in which the known sparse graph of the target function is reflected in the sparse connectivity of the network. When the graph of the target function is unknown, I conjecture that transformers are able to implement a flexible version of sparsity (selecting which input tokens interact in the MLP layer), through the self-attention layers. Surprisingly, the assumption of compositional sparsity of the target function is not restrictive in practice, since I prove here that for computable functions (if on the reals with Lipschitz continuous derivatives) compositional sparsity is equivalent to efficient computability, that is Turing computability in polynomial time. more »

Award ID(s):: 2134108

PAR ID:: 10565465

Author(s) / Creator(s):: Poggio, Tomaso

Publisher / Repository:: Center for Brains, Minds and Machines (CBMM)

Date Published:: 2023-03-08

Format(s):: Medium: X

Institution:: Massachusetts Institute of Technology

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Posted Content:
The DOI is not currently available.

More Like this