We consider numerical approaches for deterministic, finite-dimensional optimal control problems whose dynamics depend on unknown or uncertain parameters. We seek to amortize the solution over a set of relevant parameters in an offline stage to enable rapid decision-making and be able to react to changes in the parameter in the online stage. To tackle the curse of dimensionality arising when the state and/or parameter are high-dimensional, we represent the policy using neural networks. We compare two training paradigms: First, our model-based approach leverages the dynamics and definition of the objective function to learn the value function of the parameterized optimal control problem and obtain the policy using a feedback form. Second, we use actor-critic reinforcement learning to approximate the policy in a data-driven way. Using an example involving a two-dimensional convection-diffusion equation, which features high-dimensional state and parameter spaces, we investigate the accuracy and efficiency of both training paradigms. While both paradigms lead to a reasonable approximation of the policy, the model-based approach is more accurate and considerably reduces the number of PDE solves.
more »
« less
A Survey on Statistical Theory of Deep Learning: Approximation, Training Dynamics, and Generative Models
In this article, we review the literature on statistical theories of neural networks from three perspectives: approximation, training dynamics, and generative models. In the first part, results on excess risks for neural networks are reviewed in the nonparametric framework of regression. These results rely on explicit constructions of neural networks, leading to fast convergence rates of excess risks. Nonetheless, their underlying analysis only applies to the global minimizer in the highly nonconvex landscape of deep neural networks. This motivates us to review the training dynamics of neural networks in the second part. Specifically, we review articles that attempt to answer the question of how a neural network trained via gradient-based methods finds a solution that can generalize well on unseen data. In particular, two well-known paradigms are reviewed: the neural tangent kernel and mean-field paradigms. Last, we review the most recent theoretical advancements in generative models, including generative adversarial networks, diffusion models, and in-context learning in large language models from two of the same perspectives, approximation and training dynamics.
more »
« less
- Award ID(s):
- 2247795
- PAR ID:
- 10578323
- Publisher / Repository:
- Annual Review of Statistics and Its Application
- Date Published:
- Journal Name:
- Annual Review of Statistics and Its Application
- ISSN:
- 2326-8298
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Adolescence is a period of rapid biobehavioral change, characterized in part by increased neural maturation and sensitivity to one's environment. In this review, we aim to demonstrate that self-regulation skills are tuned by adolescents' social, cultural, and socioeconomic contexts. We discuss adjacent literatures that demonstrate the importance of experience-dependent learning for adolescent development: environmental contextual influences and training paradigms that aim to improve regulation skills. We first highlight changes in prominent limbic and cortical regions-like the amygdala and medial prefrontal cortex-as well as structural and functional connectivity between these areas that are associated with adolescents' regulation skills. Next, we consider how puberty, the hallmark developmental milestone in adolescence, helps instantiate these biobehavioral adaptations. We then survey the existing literature demonstrating the ways in which cultural, socioeconomic, and interpersonal contexts drive behavioral and neural adaptation for self-regulation. Finally, we highlight promising results from regulation training paradigms that suggest training may be especially efficacious for adolescent samples. In our conclusion, we highlight some exciting frontiers in human self-regulation research as well as recommendations for improving the methodological implementation of developmental neuroimaging studies and training paradigms.more » « less
-
We present a supervised learning framework of training generative models for density estimation.Generative models, including generative adversarial networks (GANs), normalizing flows, and variational auto-encoders (VAEs), are usually considered as unsupervised learning models, because labeled data are usually unavailable for training. Despite the success of the generative models, there are several issues with the unsupervised training, e.g., requirement of reversible architectures, vanishing gradients, and training instability. To enable supervised learning in generative models, we utilize the score-based diffusion model to generate labeled data. Unlike existing diffusion models that train neural networks to learn the score function, we develop a training-free score estimation method. This approach uses mini-batch-based Monte Carlo estimators to directly approximate the score function at any spatial-temporal location in solving an ordinary differential equation (ODE), corresponding to the reverse-time stochastic differential equation (SDE). This approach can offer both high accuracy and substantial time savings in neural network training. Once the labeled data are generated, we can train a simple, fully connected neural network to learn the generative model in the supervised manner. Compared with existing normalizing flow models, our method does not require the use of reversible neural networks and avoids the computation of the Jacobian matrix. Compared with existing diffusion models, our method does not need to solve the reverse-time SDE to generate new samples. As a result, the sampling efficiency is significantly improved. We demonstrate the performance of our method by applying it to a set of 2D datasets as well as real data from the University of California Irvine (UCI) repository.more » « less
-
Abstract Advancements in computing power have recently made it possible to utilize machine learning and deep learning to push scientific computing forward in a range of disciplines, such as fluid mechanics, solid mechanics, materials science, etc. The incorporation of neural networks is particularly crucial in this hybridization process. Due to their intrinsic architecture, conventional neural networks cannot be successfully trained and scoped when data are sparse, which is the case in many scientific and engineering domains. Nonetheless, neural networks provide a solid foundation to respect physics-driven or knowledge-based constraints during training. Generally speaking, there are three distinct neural network frameworks to enforce the underlying physics: (i) physics-guided neural networks (PgNNs), (ii) physics-informed neural networks (PiNNs), and (iii) physics-encoded neural networks (PeNNs). These methods provide distinct advantages for accelerating the numerical modeling of complex multiscale multiphysics phenomena. In addition, the recent developments in neural operators (NOs) add another dimension to these new simulation paradigms, especially when the real-time prediction of complex multiphysics systems is required. All these models also come with their own unique drawbacks and limitations that call for further fundamental research. This study aims to present a review of the four neural network frameworks (i.e., PgNNs, PiNNs, PeNNs, and NOs) used in scientific computing research. The state-of-the-art architectures and their applications are reviewed, limitations are discussed, and future research opportunities are presented in terms of improving algorithms, considering causalities, expanding applications, and coupling scientific and deep learning solvers.more » « less
-
ABSTRACT Hill-type muscle models are widely used, even though they do not accurately represent the relationship between activation and force in dynamic contractions. We explored the use of neural networks as an alternative approach to capture features of dynamic muscle function, without a priori assumptions about force–length–velocity relationships. We trained neural networks using an existing dataset of two guinea fowl muscles to estimate muscle force from activation, fascicle length and velocity. Training data were recorded using sonomicrometry, electromyography and a tendon buckle. First, we compared the neural networks with Hill-type muscle models, using the same data for network training and model optimization. Second, we trained neural networks on larger datasets, in a more realistic machine learning scenario. We found that neural networks generally yielded higher coefficients of determination and lower errors than Hill-type muscle models. Neural networks performed better when estimating forces on the muscle used for training, but on another bird, than on a different muscle of the same bird, likely due to inaccuracies in activation and force scaling. We extracted force–length and force–velocity relationships from the trained neural networks and found that both effects were underestimated and the relationships were not well replicated outside the training data distribution. We discuss suggested experimental designs and the challenge of collecting suitable training data. Given a suitable training dataset, neural networks could provide a useful alternative to Hill-type muscle models, particularly for modeling muscle dynamics in faster movements; however, scaling of the training data should be comparable between muscles and animals.more » « less
An official website of the United States government

