skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Why shallow networks struggle to approximate and learn high frequencies
Abstract In this work, we present a comprehensive study combining mathematical and computational analysis to explain why a two-layer neural network struggles to handle high frequencies in both approximation and learning, especially when machine precision, numerical noise and computational cost are significant factors in practice. Specifically, we investigate the following fundamental computational issues: (1) the minimal numerical error achievable under finite precision, (2) the computational cost required to attain a given accuracy and (3) the stability of the method with respect to perturbations. The core of our analysis lies in the conditioning of the representation and its learning dynamics. Explicit answers to these questions are provided, along with supporting numerical evidence.  more » « less
Award ID(s):
2309530 2309551 2012860
PAR ID:
10617590
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
14
Issue:
3
ISSN:
2049-8772
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The degree of rate control (DRC) quantitatively identifies the kinetically relevant (sometimes known as rate‐limiting) steps of a complex reaction network. This concept relies on derivatives which are commonly implemented numerically, for example, with finite differences (FDs). Numerical derivatives are tedious to implement, and can be problematic, and unstable or unreliable. In this study, we demonstrate the use of automatic differentiation (AD) in the evaluation of the DRC. AD libraries are increasingly available through modern machine learning frameworks. Compared with the FDs, AD provides solutions with higher accuracy with lower computational cost. We demonstrate applications in steady‐state and transient kinetics. Furthermore, we illustrate a hybrid local‐global sensitivity analysis method, the distributed evaluation of local sensitivity analysis, to assess the importance of kinetic parameters over an uncertain space. This method also benefits from AD to obtain high‐quality results efficiently. 
    more » « less
  2. Abstract Active learning is a subfield of machine learning that focuses on improving the data collection efficiency in expensive-to-evaluate systems. Active learning-applied surrogate modeling facilitates cost-efficient analysis of demanding engineering systems, while the existence of heterogeneity in underlying systems may adversely affect the performance. In this article, we propose the partitioned active learning that quantifies informativeness of new design points by circumventing heterogeneity in systems. The proposed method partitions the design space based on heterogeneous features and searches for the next design point with two systematic steps. The global searching scheme accelerates exploration by identifying the most uncertain subregion, and the local searching utilizes circumscribed information induced by the local Gaussian process (GP). We also propose Cholesky update-driven numerical remedies for our active learning to address the computational complexity challenge. The proposed method consistently outperforms existing active learning methods in three real-world cases with better prediction and computation time. 
    more » « less
  3. In this paper, a computationally efficient data-driven hybrid automaton model is proposed to capture unknown complex dynamical system behaviors using multiple neural networks. The sampled data of the system is divided by valid partitions into groups corresponding to their topologies and based on which, transition guards are defined. Then, a collection of small-scale neural networks that are computationally efficient are trained as the local dynamical description for their corresponding topologies. After modeling the system with a neural-network-based hybrid automaton, the set-valued reachability analysis with low computation cost is provided based on interval analysis and a split and combined process. At last, a numerical example of the limit cycle is presented to illustrate that the developed models can significantly reduce the computational cost in reachable set computation without sacrificing any modeling precision. 
    more » « less
  4. This study introduces a novel method to enhance numerical simulation accuracy for high-speed flows by refining the weighted essentially non-oscillatory (WENO) flux with higher-order corrections like the modified weighted compact scheme (MWCS). Numerical experiments demonstrate improved sharpness in capturing shock waves and stability in complex conditions like two interacting blast waves. Key highlights include simultaneous capture of small-scale smooth fluctuations and shock waves with precision surpassing the original WENO and MWCS methods. Despite the significantly improved accuracy, the extra computational cost brought by the new method is only marginally increased compared to the original WENO, and it outperforms MWCS in both accuracy and efficiency. Overall, this method enhances simulation fidelity and effectively balances accuracy and computational efficiency across various problems. 
    more » « less
  5. Abstract Data analyses in particle physics rely on an accurate simulation of particle collisions and a detailed simulation of detector effects to extract physics knowledge from the recorded data. Event generators together with ageant-based simulation of the detectors are used to produce large samples of simulated events for analysis by the LHC experiments. These simulations come at a high computational cost, where the detector simulation and reconstruction algorithms have the largest CPU demands. This article describes how machine-learning (ML) techniques are used to reweight simulated samples obtained with a given set of parameters to samples with different parameters or samples obtained from entirely different simulation programs. The ML reweighting method avoids the need for simulating the detector response multiple times by incorporating the relevant information in a single sample through event weights. Results are presented for reweighting to model variations and higher-order calculations in simulated top quark pair production at the LHC. This ML-based reweighting is an important element of the future computing model of the CMS experiment and will facilitate precision measurements at the High-Luminosity LHC. 
    more » « less