skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deep Network Approximation: Achieving Arbitrary Accuracy with Fixed Number of Neurons
This paper develops simple feed-forward neural networks that achieve the universal approximation property for all continuous functions with a fixed finite number of neurons. These neural networks are simple because they are designed with a simple and computable continuous activation function $$\sigma$$ leveraging a triangular-wave function and a softsign function. We prove that $$\sigma$$-activated networks with width $36d(2d+1)$ and depth $11$ can approximate any continuous function on a $$d$$-dimensional hypercube within an arbitrarily small error. Hence, for supervised learning and its related regression problems, the hypothesis space generated by these networks with a size not smaller than $$36d(2d+1)\times 11$$ is dense in the space of continuous functions. Furthermore, classification functions arising from image and signal classification are in the hypothesis space generated by $$\sigma$$-activated networks with width $36d(2d+1)$ and depth $12$, when there exist pairwise disjoint closed bounded subsets of $$\mathbb{R}^d$$ such that the samples of the same class are located in the same subset.  more » « less
Award ID(s):
2244988 2206333
PAR ID:
10426137
Author(s) / Creator(s):
; ;
Editor(s):
Ruslan Salakhutdinov
Date Published:
Journal Name:
Journal of machine learning research
Volume:
23
ISSN:
1532-4435
Page Range / eLocation ID:
1 - 60
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A new network with super-approximation power is introduced. This network is built with Floor (⌊x⌋) or ReLU (max{0,x}) activation function in each neuron; hence, we call such networks Floor-ReLU networks. For any hyperparameters N∈N+ and L∈N+, we show that Floor-ReLU networks with width max{d,5N+13} and depth 64dL+3 can uniformly approximate a Hölder function f on [0,1]d with an approximation error 3λdα/2N-αL, where α∈(0,1] and λ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function f on [0,1]d with a modulus of continuity ωf(·), the constructive approximation rate is ωf(dN-L)+2ωf(d)N-L. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of ωf(r) as r→0 is moderate (e.g., ωf(r)≲rα for Hölder continuous functions), since the major term to be considered in our approximation rate is essentially d times a function of N and L independent of d within the modulus of continuity. 
    more » « less
  2. We studied the least-squares ReLU neural network (LSNN) method for solving a linear advection-reaction equation with discontinuous solution in [Z. Cai et al., J. Comput. Phys., 443 (2021), 110514]. The method is based on a least-squares formulation and uses a new class of approximating functions: ReLU neural network (NN) functions. A critical and additional component of the LSNN method, differing from other NN-based methods, is the introduction of a properly designed and physics preserved discrete differential operator. In this paper, we study the LSNN method for problems with discontinuity interfaces. First, we show that ReLU NN functions with depth \(\lceil \log\_2(d+1)\rceil+1\) can approximate any \(d\)-dimensional step function on a discontinuity interface generated by a vector field as streamlines with any prescribed accuracy. By decomposing the solution into continuous and discontinuous parts, we prove theoretically that the discretization error of the LSNN method using ReLU NN functions with depth \(\lceil \log\_2(d+1)\rceil+1\) is mainly determined by the continuous part of the solution provided that the solution jump is constant. Numerical results for both two- and three-dimensional test problems with various discontinuity interfaces show that the LSNN method with enough layers is accurate and does not exhibit the common Gibbs phenomena along discontinuity interfaces. 
    more » « less
  3. While neural networks are used for classification tasks across domains, a long-standing open problem in machine learning is determining whether neural networks trained using standard procedures are consistent for classification, i.e., whether such models minimize the probability of misclassification for arbitrary data distributions. In this work, we identify and construct an explicit set of neural network classifiers that are consistent. Since effective neural networks in practice are typically both wide and deep, we analyze infinitely wide networks that are also infinitely deep. In particular, using the recent connection between infinitely wide neural networks and neural tangent kernels, we provide explicit activation functions that can be used to construct networks that achieve consistency. Interestingly, these activation functions are simple and easy to implement, yet differ from commonly used activations such as ReLU or sigmoid. More generally, we create a taxonomy of infinitely wide and deep networks and show that these models implement one of three well-known classifiers depending on the activation function used: 1) 1-nearest neighbor (model predictions are given by the label of the nearest training example); 2) majority vote (model predictions are given by the label of the class with the greatest representation in the training set); or 3) singular kernel classifiers (a set of classifiers containing those that achieve consistency). Our results highlight the benefit of using deep networks for classification tasks, in contrast to regression tasks, where excessive depth is harmful. 
    more » « less
  4. We give estimates from below for the error of approximation of a compact subset from a Banach space by the outputs of feed-forward neural networks with width W, depth l and Lipschitz activation functions. We show that, modulo logarithmic factors, rates better that entropy numbers' rates are possibly attainable only for neural networks for which the depth l goes to infinity, and that there is no gain if we fix the depth and let the width W go to infinity. 
    more » « less
  5. Krause, Andreas; Brunskill, Emma; Cho, Kyunghyun; Engelhardt, Barbara; Sabato, Sivan; Scarlett, Jonathan. (Ed.)
    The parameter space for any fixed architecture of feedforward ReLU neural networks serves as a proxy during training for the associated class of functions - but how faithful is this representation? It is known that many different parameter settings $$\theta$$ can determine the same function $$f$$. Moreover, the degree of this redundancy is inhomogeneous: for some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. We also describe a number of mechanisms through which hidden symmetries can arise, and empirically approximate the functional dimension of different network architectures at initialization. These experiments indicate that the probability that a network has no hidden symmetries decreases towards 0 as depth increases, while increasing towards 1 as width and input dimension increase. 
    more » « less