skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A manifold two-sample test study: integral probability metric with neural networks
Abstract Two-sample tests are important areas aiming to determine whether two collections of observations follow the same distribution or not. We propose two-sample tests based on integral probability metric (IPM) for high-dimensional samples supported on a low-dimensional manifold. We characterize the properties of proposed tests with respect to the number of samples $$n$$ and the structure of the manifold with intrinsic dimension $$d$$. When an atlas is given, we propose a two-step test to identify the difference between general distributions, which achieves the type-II risk in the order of $$n^{-1/\max \{d,2\}}$$. When an atlas is not given, we propose Hölder IPM test that applies for data distributions with $$(s,\beta )$$-Hölder densities, which achieves the type-II risk in the order of $$n^{-(s+\beta )/d}$$. To mitigate the heavy computation burden of evaluating the Hölder IPM, we approximate the Hölder function class using neural networks. Based on the approximation theory of neural networks, we show that the neural network IPM test has the type-II risk in the order of $$n^{-(s+\beta )/d}$$, which is in the same order of the type-II risk as the Hölder IPM test. Our proposed tests are adaptive to low-dimensional geometric structure because their performance crucially depends on the intrinsic dimension instead of the data dimension.  more » « less
Award ID(s):
2012652 2134037
PAR ID:
10422481
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
12
Issue:
3
ISSN:
2049-8772
Format(s):
Medium: X Size: p. 1867-1897
Size(s):
p. 1867-1897
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Most of existing statistical theories on deep neural networks have sample complexities cursed by the data dimension and therefore cannot well explain the empirical success of deep learning on high-dimensional data. To bridge this gap, we propose to exploit the low-dimensional structures of the real world datasets and establish theoretical guarantees of convolutional residual networks (ConvResNet) in terms of function approximation and statistical recovery for binary classification problem. Specifically, given the data lying on a 𝑑-dimensional manifold isometrically embedded in ℝ^𝐷, we prove that if the network architecture is properly chosen, ConvResNets can (1) approximate Besov functions on manifolds with arbitrary accuracy, and (2) learn a classifier by minimizing the empirical logistic risk, which gives an excess risk in the order of 𝑛−2s/(2s+d), where 𝑠 is a smoothness parameter. This implies that the sample complexity depends on the intrinsic dimension 𝑑, instead of the data dimension 𝐷. Our results demonstrate that ConvResNets are adaptive to low-dimensional structures of data sets. 
    more » « less
  2. When training deep neural networks, a model's generalization error is often observed to follow a power scaling law dependent both on the model size and the data size. Perhaps the best known example of such scaling laws are for transformer-based large language models (**LLMs**), where networks with billions of parameters are trained on trillions of tokens of text. Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. Our theory predicts a power law between the generalization error and both the training data size and the network size for transformers, where the power depends on the intrinsic dimension d of the training data. Notably, the constructed model architecture is shallow, requiring only logarithmic depth in d. By leveraging low-dimensional data structures under a manifold hypothesis, we are able to explain transformer scaling laws in a way which respects the data geometry. Moreover, we test our theory with empirical observation by training LLMs on natural language datasets. We find the observed empirical scaling laws closely agree with our theoretical predictions. Taken together, these results rigorously show the intrinsic dimension of data to be a crucial quantity affecting transformer scaling laws in both theory and practice. 
    more » « less
  3. Set representation has become ubiquitous in deep learning for modeling the inductive bias of neural networks that are insensitive to the input order. DeepSets is the most widely used neural network architecture for set representation. It involves embedding each set element into a latent space with dimension L, followed by a sum pooling to obtain a whole-set embedding, and finally mapping the whole-set embedding to the output. In this work, we investigate the impact of the dimension L on the expressive power of DeepSets. Previous analyses either oversimplified high-dimensional features to be one-dimensional features or were limited to analytic activations, thereby diverging from practical use or resulting in L that grows exponentially with the set size N and feature dimension D. To investigate the minimal value of L that achieves sufficient expressive power, we present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE). We demonstrate that L being poly(N,D) is sufficient for set representation using both embedding layers. We also provide a lower bound of L for the LP embedding layer. Furthermore, we extend our results to permutation-equivariant set functions and the complex field. 
    more » « less
  4. Abstract Let $${\mathrm{Diff}}_{0}(N)$$ represent the subgroup of diffeomorphisms that are homotopic to the identity. We show that if $$N$$ is a closed hyperbolic 4-manifold, then $$\pi _{0}{\mathrm{Diff}}_{0}(N)$$ is not finitely generated with similar results holding topologically. This proves in dimension-4 results previously known for $$n$$-dimensional hyperbolic manifolds of dimension $$n\ge 11$$ by Farrell and Jones in 1989 and $$n\ge 10$$ by Farrell and Ontaneda in 2010. Our proof relies on the technical result that $$\pi _{0}{\mathrm{Homeo}}(S^{1}\times D^{3})$$ is not finitely generated, which extends to the topological category smooth results of the authors. We also show that $$\pi _{n-4} {\mathrm{Homeo}}(S^{1} \times D^{n-1})$$ is not finitely generated for $$n \geq 4$$ and in particular $$\pi _{0}{\mathrm{Homeo}}(S^{1}\times D^{3})$$ is not finitely generated. These results are new for $n=4, 5$ and $$7$$. We also introduce higher dimensional barbell maps and establish some of their basic properties. 
    more » « less
  5. null (Ed.)
    Abstract We investigate the Hölder geometry of curves generated by iterated function systems (IFS) in a complete metric space. A theorem of Hata from 1985 asserts that every connected attractor of an IFS is locally connected and path-connected. We give a quantitative strengthening of Hata’s theorem. First we prove that every connected attractor of an IFS is (1/ s )-Hölder path-connected, where s is the similarity dimension of the IFS. Then we show that every connected attractor of an IFS is parameterized by a (1/ α)-Hölder curve for all α > s . At the endpoint, α = s , a theorem of Remes from 1998 already established that connected self-similar sets in Euclidean space that satisfy the open set condition are parameterized by (1/ s )-Hölder curves. In a secondary result, we show how to promote Remes’ theorem to self-similar sets in complete metric spaces, but in this setting require the attractor to have positive s -dimensional Hausdorff measure in lieu of the open set condition. To close the paper, we determine sharp Hölder exponents of parameterizations in the class of connected self-affine Bedford-McMullen carpets and build parameterizations of self-affine sponges. An interesting phenomenon emerges in the self-affine setting. While the optimal parameter s for a self-similar curve in ℝ n is always at most the ambient dimension n , the optimal parameter s for a self-affine curve in ℝ n may be strictly greater than n . 
    more » « less