NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories

Zhang, Zixuan; Chen, Minshuo; Wang, Mengdi; Liao, Wenjing; Zhao, Tuo (July 2025, Proceedings of the 40th International Conference on Machine Learning, PMLR 202:40911-40931, 2023.)

Free, publicly-accessible full text available July 3, 2026
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data

Havrilla, Alex; Liao, Wenjing (January 2025, Advances in Neural Information Processing Systems 37 (NeurIPS 2024) Main Conference Track)

When training deep neural networks, a model's generalization error is often observed to follow a power scaling law dependent both on the model size and the data size. Perhaps the best known example of such scaling laws are for transformer-based large language models (**LLMs**), where networks with billions of parameters are trained on trillions of tokens of text. Yet, despite sustained widespread interest, a rigorous understanding of why transformer scaling laws exist is still missing. To answer this question, we establish novel statistical estimation and mathematical approximation theories for transformers when the input data are concentrated on a low-dimensional manifold. Our theory predicts a power law between the generalization error and both the training data size and the network size for transformers, where the power depends on the intrinsic dimension d of the training data. Notably, the constructed model architecture is shallow, requiring only logarithmic depth in d. By leveraging low-dimensional data structures under a manifold hypothesis, we are able to explain transformer scaling laws in a way which respects the data geometry. Moreover, we test our theory with empirical observation by training LLMs on natural language datasets. We find the observed empirical scaling laws closely agree with our theoretical predictions. Taken together, these results rigorously show the intrinsic dimension of data to be a crucial quantity affecting transformer scaling laws in both theory and practice.
more » « less
Free, publicly-accessible full text available January 31, 2026
Generalization error guaranteed auto-encoder-based nonlinear model reduction for operator learning

https://doi.org/10.1016/j.acha.2024.101717

Liu, Hao; Dahal, Biraj; Lai, Rongjie; Liao, Wenjing (January 2025, Applied and Computational Harmonic Analysis)
Fornasier, Massimo (Ed.)
Full Text Available
Learning Functions Varying along a Central Subspace

https://doi.org/10.1137/23M1557751

Liu, Hao; Liao, Wenjing (June 2024, SIAM Journal on Mathematics of Data Science)
Deep nonparametric estimation of intrinsic data structures by chart autoencoders: Generalization error and robustness

https://doi.org/10.1016/j.acha.2023.101602

Liu, Hao; Havrilla, Alex; Lai, Rongjie; Liao, Wenjing (January 2024, Applied and Computational Harmonic Analysis)

Full Text Available
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Liu, Hao; Yang, Haizhao; Chen, Minshuo; Zhao, Tuo; Liao, Wenjing (January 2024, Journal of machine learning research)

Learning operators between infinitely dimensional spaces is an important learning task arising in machine learning, imaging science, mathematical modeling and simulations, etc. This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks. Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class. Under the assumption that the target operator exhibits a low dimensional structure, our error bounds decay as the training sample size increases, with an attractive fast rate depending on the intrinsic dimension in our estimation. Our assumptions cover most scenarios in real applications and our results give rise to fast rates by exploiting low dimensional structures of data in operator estimation. We also investigate the influence of network structures (e.g., network width, depth, and sparsity) on the generalization error of the neural network estimator and propose a general suggestion on the choice of network structures to maximize the learning efficiency quantitatively.
more » « less
Full Text Available
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Liu, Hao; Yang, Haizhao; Chen, Minshuo; Zhao, Tuo; Liao, Wenjing (January 2024, Journal of Machine Learning Research)

Full Text Available
Dual Fourier Unet: scale-robust diffusion model for zero-shot super-resolution image generation

Havrilla, Alexander; Rojas, Kevin; Liao, Wenjing; Tao, Molei (December 2023, NeurIPS 2023 Workshop on Diffusion Models)
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Liu, Hao; Yang, Haizhao; Chen, Minshuo; Zhao, Tuo; Liao, Wenjing (January 2024, Journal of Machine Learning Research)
Maxim Raginsky (Ed.)
Full Text Available
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces

Liu, Hao; Yang, Haizhao; Chen, Minshuo; Zhao, Tuo; Liao, Wenjing (January 2024, Journal of Machine Learning Research)
Raginsky, Maxim (Ed.)
Learning operators between infinitely dimensional spaces is an important learning task arising in machine learning, imaging science, mathematical modeling and simulations, etc. This paper studies the nonparametric estimation of Lipschitz operators using deep neural networks. Non-asymptotic upper bounds are derived for the generalization error of the empirical risk minimizer over a properly chosen network class. Under the assumption that the target operator exhibits a low dimensional structure, our error bounds decay as the training sample size increases, with an attractive fast rate depending on the intrinsic dimension in our estimation. Our assumptions cover most scenarios in real applications and our results give rise to fast rates by exploiting low dimensional structures of data in operator estimation. We also investigate the influence of network structures (e.g., network width, depth, and sparsity) on the generalization error of the neural network estimator and propose a general suggestion on the choice of network structures to maximize the learning efficiency quantitatively.
more » « less
Full Text Available

« Prev Next »

Search for: All records