 Award ID(s):
 2012652
 NSFPAR ID:
 10428248
 Date Published:
 Journal Name:
 Advances in Neural Information Processing Systems
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on highdimensional data. To bridge this gap, we propose to exploit the lowdimensional structures of the real world datasets and establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical recovery for binary classification problem. Specifically, given the data lying on a 𝑑dimensional manifold isometrically embedded in ℝ^𝐷, we prove that if the network architecture is properly chosen, ConvResNets can (1) approximate Besov functions on manifolds with arbitrary accuracy, and (2) learn a classifier by minimizing the empirical logistic risk, which gives an excess risk in the order of 𝑛−2s/(2s+d), where 𝑠 is a smoothness parameter. This implies that the sample complexity depends on the intrinsic dimension 𝑑, instead of the data dimension 𝐷. Our results demonstrate that ConvResNets are adaptive to lowdimensional structures of data sets.more » « less

Abstract Twosample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose twosample tests based on integral probability metric (IPM) for highdimensional samples supported on a lowdimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $n$ and the structure of the manifold with intrinsic dimension $d$. When an atlas is given, we propose a twostep test to identify the difference between general distributions, which achieves the typeII risk in the order of $n^{1/\max \{d,2\}}$. When an atlas is not given, we propose Hölder IPM test that applies for data distributions with $(s,\beta )$Hölder densities, which achieves the typeII risk in the order of $n^{(s+\beta )/d}$. To mitigate the heavy computation burden of evaluating the Hölder IPM, we approximate the Hölder function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the typeII risk in the order of $n^{(s+\beta )/d}$, which is in the same order of the typeII risk as the Hölder IPM test. Our proposed tests are adaptive to lowdimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.

Deep neural networks have revolutionized many real world applications, due to their flexibility in data fitting and accurate predictions for unseen data. A line of research reveals that neural networks can approximate certain classes of functions with an arbitrary accuracy, while the size of the network scales exponentially with respect to the data dimension. Empirical results, however, suggest that networks of moderate size already yield appealing performance. To explain such a gap, a common belief is that many data sets exhibit low dimensional structures, and can be modeled as samples near a low dimensional manifold. In this paper, we prove that neural networks can efficiently approximate functions supported on low dimensional manifolds. The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function. Our result shows that exploiting low dimensional data structures can greatly enhance the efficiency in function approximation by neural networks. We also implement a subnetwork that assigns input data to their corresponding local neighborhoods, which may be of independent interest.more » « less

Deep neural networks have revolutionized many real world applications, due to their flexibility in data fitting and accurate predictions for unseen data. A line of research reveals that neural networks can approximate certain classes of functions with an arbitrary accuracy, while the size of the network scales exponentially with respect to the data dimension. Empirical results, however, suggest that networks of moderate size already yield appealing performance. To explain such a gap, a common belief is that many data sets exhibit low dimensional structures, and can be modeled as samples near a low dimensional manifold. In this paper, we prove that neural networks can efficiently approximate functions supported on low dimensional manifolds. The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function. Our result shows that exploiting low dimensional data structures can greatly enhance the efficiency in function approximation by neural networks. We also implement a subnetwork that assigns input data to their corresponding local neighborhoods, which may be of independent interest.more » « less

Deep neural networks have revolutionized many real world applications, due to their flexibility in data fitting and accurate predictions for unseen data. A line of research reveals that neural networks can approximate certain classes of functions with an arbitrary accuracy, while the size of the network scales exponentially with respect to the data dimension. Empirical results, however, suggest that networks of moderate size already yield appealing performance. To explain such a gap, a common belief is that many data sets exhibit low dimensional structures, and can be modeled as samples near a low dimensional manifold. In this paper, we prove that neural networks can efficiently approximate functions supported on low dimensional manifolds. The network size scales exponentially in the approximation error, with an exponent depending on the intrinsic dimension of the data and the smoothness of the function. Our result shows that exploiting low dimensional data structures can greatly enhance the efficiency in function approximation by neural networks. We also implement a subnetwork that assigns input data to their corresponding local neighborhoods, which may be of independent interest.more » « less