skip to main content


Title: Global convergence of neuron birth-death dynamics
Neural networks with a large number of units ad- mit a mean-field description, which has recently served as a theoretical explanation for the favor- able training properties of “overparameterized” models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appro- priate assumptions. In this work, we propose a non-local mass transport dynamics that leads to a modified PDE with the same minimizer. We im- plement this non-local dynamics as a stochastic neuronal birth-death process and we prove that it accelerates the rate of convergence in the mean- field limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the mean-field equation, each of which can easily be implemented for neural networks with finite numbers of units. We illustrate our algorithms with two models to provide intuition for the mech- anism through which convergence is accelerated  more » « less
Award ID(s):
1845360
NSF-PAR ID:
10159675
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International Conference on Machine Learning
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Near-wall flow simulation remains a central challenge in aerodynamics modelling: Reynolds-averaged Navier–Stokes predictions of separated flows are often inaccurate, and large-eddy simulation (LES) can require prohibitively small near-wall mesh sizes. A deep learning (DL) closure model for LES is developed by introducing untrained neural networks into the governing equations and training in situ for incompressible flows around rectangular prisms at moderate Reynolds numbers. The DL-LES models are trained using adjoint partial differential equation (PDE) optimization methods to match, as closely as possible, direct numerical simulation (DNS) data. They are then evaluated out-of-sample – for aspect ratios, Reynolds numbers and bluff-body geometries not included in the training data – and compared with standard LES models. The DL-LES models outperform these models and are able to achieve accurate LES predictions on a relatively coarse mesh (downsampled from the DNS mesh by factors of four or eight in each Cartesian direction). We study the accuracy of the DL-LES model for predicting the drag coefficient, near-wall and far-field mean flow, and resolved Reynolds stress. A crucial challenge is that the LES quantities of interest are the steady-state flow statistics; for example, a time-averaged velocity component $\langle {u}_i\rangle (x) = \lim _{t \rightarrow \infty } ({1}/{t}) \int _0^t u_i(s,x)\, {\rm d}s$ . Calculating the steady-state flow statistics therefore requires simulating the DL-LES equations over a large number of flow times through the domain. It is a non-trivial question whether an unsteady PDE model with a functional form defined by a deep neural network can remain stable and accurate on $t \in [0, \infty )$ , especially when trained over comparatively short time intervals. Our results demonstrate that the DL-LES models are accurate and stable over long time horizons, which enables the estimation of the steady-state mean velocity, fluctuations and drag coefficient of turbulent flows around bluff bodies relevant to aerodynamics applications. 
    more » « less
  2. The physics-informed neural networks (PINNs) has been widely utilized to numerically approximate PDE problems. While PINNs has achieved good results in producing solutions for many partial differential equations, studies have shown that it does not perform well on phase field models. In this paper, we partially address this issue by introducing a modified physics-informed neural networks. In particular, they are used to numerically approximate Allen- Cahn-Ohta-Kawasaki (ACOK) equation with a volume constraint. Technically, the inverse of Laplacian in the ACOK model presents many challenges to the baseline PINNs. To take the zero- mean condition of the inverse of Laplacian, as well as the volume constraint, into consideration, we also introduce a separate neural network, which takes the second set of sampling points in the approximation process. Numerical results are shown to demonstrate the effectiveness of the modified PINNs. An additional benefit of this research is that the modified PINNs can also be applied to learn more general nonlocal phase-field models, even with an unknown nonlocal kernel. 
    more » « less
  3. We investigate the uniform reshuffling model for money exchanges: two agents picked uniformly at random redistribute their dollars between them. This stochastic dynamics is of mean-field type and eventually leads to a exponential distribution of wealth. To better understand this dynamics, we investigate its limit as the number of agents goes to infinity. We prove rigorously the so-called propagation of chaos which links the stochastic dynamics to a (limiting) nonlinear partial differential equation (PDE). This deterministic description, which is well-known in the literature, has a flavor of the classical Boltzmann equation arising from statistical mechanics of dilute gases. We prove its convergence toward its exponential equilibrium distribution in the sense of relative entropy. 
    more » « less
  4. It is currently known how to characterize functions that neural networks can learn with SGD for two extremal parametrizations: neural networks in the linear regime, and neural networks with no structural constraints. However, for the main parametrization of interest —non-linear but regular networks— no tight characterization has yet been achieved, despite significant developments. We take a step in this direction by considering depth-2 neural networks trained by SGD in the mean-field regime. We consider functions on binary inputs that depend on a latent low-dimensional subspace (i.e., small number of coordinates). This regime is of interest since it is poorly under- stood how neural networks routinely tackle high-dimensional datasets and adapt to latent low- dimensional structure without suffering from the curse of dimensionality. Accordingly, we study SGD-learnability with O(d) sample complexity in a large ambient dimension d. Our main results characterize a hierarchical property —the merged-staircase property— that is both necessary and nearly sufficient for learning in this setting. We further show that non-linear training is necessary: for this class of functions, linear methods on any feature map (e.g., the NTK) are not capable of learning efficiently. The key tools are a new “dimension-free” dynamics approximation result that applies to functions defined on a latent space of low-dimension, a proof of global convergence based on polynomial identity testing, and an improvement of lower bounds against linear methods for non-almost orthogonal functions. 
    more » « less
  5. In many mechanistic medical, biological, physical, and engineered spatiotemporal dynamic models the numerical solution of partial differential equations (PDEs), especially for diffusion, fluid flow and mechanical relaxation, can make simulations impractically slow. Biological models of tissues and organs often require the simultaneous calculation of the spatial variation of concentration of dozens of diffusing chemical species. One clinical example where rapid calculation of a diffusing field is of use is the estimation of oxygen gradients in the retina, based on imaging of the retinal vasculature, to guide surgical interventions in diabetic retinopathy. Furthermore, the ability to predict blood perfusion and oxygenation may one day guide clinical interventions in diverse settings, i.e., from stent placement in treating heart disease to BOLD fMRI interpretation in evaluating cognitive function (Xie et al., 2019 ; Lee et al., 2020 ). Since the quasi-steady-state solutions required for fast-diffusing chemical species like oxygen are particularly computationally costly, we consider the use of a neural network to provide an approximate solution to the steady-state diffusion equation. Machine learning surrogates, neural networks trained to provide approximate solutions to such complicated numerical problems, can often provide speed-ups of several orders of magnitude compared to direct calculation. Surrogates of PDEs could enable use of larger and more detailed models than are possible with direct calculation and can make including such simulations in real-time or near-real time workflows practical. Creating a surrogate requires running the direct calculation tens of thousands of times to generate training data and then training the neural network, both of which are computationally expensive. Often the practical applications of such models require thousands to millions of replica simulations, for example for parameter identification and uncertainty quantification, each of which gains speed from surrogate use and rapidly recovers the up-front costs of surrogate generation. We use a Convolutional Neural Network to approximate the stationary solution to the diffusion equation in the case of two equal-diameter, circular, constant-value sources located at random positions in a two-dimensional square domain with absorbing boundary conditions. Such a configuration caricatures the chemical concentration field of a fast-diffusing species like oxygen in a tissue with two parallel blood vessels in a cross section perpendicular to the two blood vessels. To improve convergence during training, we apply a training approach that uses roll-back to reject stochastic changes to the network that increase the loss function. The trained neural network approximation is about 1000 times faster than the direct calculation for individual replicas. Because different applications will have different criteria for acceptable approximation accuracy, we discuss a variety of loss functions and accuracy estimators that can help select the best network for a particular application. We briefly discuss some of the issues we encountered with overfitting, mismapping of the field values and the geometrical conditions that lead to large absolute and relative errors in the approximate solution. 
    more » « less