skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Depth-Separation with Multilayer Mean-Field Networks
Mean-field limit has been successfully applied to neural networks, leading to many results in optimizing overparametrized networks. However, existing works often focus on two-layer networks and/or require large number of neurons. We give a new framework for extending the mean-field limit to multilayer network, and show that a polynomial-size three-layer network in our framework can learn the function constructed by Safran et al. (2019) – which is known to be not approximable by any two-layer networks  more » « less
Award ID(s):
1845171
PAR ID:
10335916
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Conference on Learning Representations
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, a multilayer perceptron (MLP)-type artificial neural network model with a back-propagation training algorithm is utilized to model the bubble growth and bubble dynamics parameters in nucleate boiling with a non-uniform electric field. The influences of the electric field on different parameters that describe bubble’s behaviors including bubble waiting time, bubble departure frequency, bubble growth time, and bubble departure diameter are considered. This study models single bubble dynamic behaviors of R113 created on a heater in an inconsistent electric field by utilizing a MLP neural network optimized by four different swarm-based optimization algorithms, namely: Salp Swarm Algorithm (SSA), Grey Wolf Optimizer (GWO), Artificial Bee Colony (ABC) algorithm, and Particle Swarm Optimization (PSO). For evaluating the model effectiveness, the MSE value (Mean-Square Error) of the artificial neural network model with various optimization algorithms is measured and compared. The results suggest that the optimal networks in the two-hidden layer and three-hidden layer models for the bubble departure diameter improve MSE by 33.85% and 35.27%, respectively, when compared with the best response in the one-hidden layer model. Additionally, for bubble growth time, the networks with two hidden layers and three hidden layers have the 44.51% and 45.85% reduction in error, when compared with the network with one hidden layer, respectively. For the departure frequency, the error reduction in the two-layer and three-layer networks is 46.85% and 62.32%, respectively. For bubble waiting time, the best networks in the two hidden-layer and three hidden-layer models improve MSE by 52.44% and 62.27% compared with the best 1HL model response, respectively. Also, the two algorithms of SSA and GWO are able to compete well (comparable MSE) with the PSO and ABC algorithms. 
    more » « less
  2. Abstract We analyze the dynamics of finite width effects in wide but finite feature learning neural networks. Starting from a dynamical mean field theory description of infinite width deep neural network kernel and prediction dynamics, we provide a characterization of the O ( 1 / width ) fluctuations of the dynamical mean field theory order parameters over random initializations of the network weights. Our results, while perturbative in width, unlike prior analyses, are non-perturbative in the strength of feature learning. We find that once the mean field/µP parameterization is adopted, the leading finite size effect on the dynamics is to introduce initialization variance in the predictions and feature kernels of the networks. In the lazy limit of network training, all kernels are random but static in time and the prediction variance has a universal form. However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently. In two layer networks, we show how feature learning can dynamically reduce the variance of the final tangent kernel and final network predictions. We also show how initialization variance can slow down online learning in wide but finite networks. In deeper networks, kernel variance can dramatically accumulate through subsequent layers at large feature learning strengths, but feature learning continues to improve the signal-to-noise ratio of the feature kernels. In discrete time, we demonstrate that large learning rate phenomena such as edge of stability effects can be well captured by infinite width dynamics and that initialization variance can decrease dynamically. For convolutional neural networks trained on CIFAR-10, we empirically find significant corrections to both the bias and variance of network dynamics due to finite width. 
    more » « less
  3. Neural networks with a large number of units ad- mit a mean-field description, which has recently served as a theoretical explanation for the favor- able training properties of “overparameterized” models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appro- priate assumptions. In this work, we propose a non-local mass transport dynamics that leads to a modified PDE with the same minimizer. We im- plement this non-local dynamics as a stochastic neuronal birth-death process and we prove that it accelerates the rate of convergence in the mean- field limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the mean-field equation, each of which can easily be implemented for neural networks with finite numbers of units. We illustrate our algorithms with two models to provide intuition for the mech- anism through which convergence is accelerated 
    more » « less
  4. Abstract The multilayer network framework has served to describe and uncover a number of novel and unforeseen physical behaviors and regimes in interacting complex systems. However, the majority of existing studies are built on undirected multilayer networks while most complex systems in nature exhibit directed interactions. Here, we propose a framework to analyze diffusive dynamics on multilayer networks consisting of at least one directed layer. We rigorously demonstrate that directionality in multilayer networks can fundamentally change the behavior of diffusive dynamics: from monotonic (in undirected systems) to non-monotonic diffusion with respect to the interlayer coupling strength. Moreover, for certain multilayer network configurations, the directionality can induce a unique superdiffusion regime for intermediate values of the interlayer coupling, wherein the diffusion is even faster than that corresponding to the theoretical limit for undirected systems, i.e. the diffusion in the integrated network obtained from the aggregation of each layer. We theoretically and numerically show that the existence of superdiffusion is fully determined by the directionality of each layer and the topological overlap between layers. We further provide a formulation of multilayer networks displaying superdiffusion. Our results highlight the significance of incorporating the interacting directionality in multilevel networked systems and provide a framework to analyze dynamical processes on interconnected complex systems with directionality. 
    more » « less
  5. Beck, Jeff (Ed.)
    Characterizing metastable neural dynamics in finite-size spiking networks remains a daunting challenge. We propose to address this challenge in the recently introduced replica-mean-field (RMF) limit. In this limit, networks are made of infinitely many replicas of the finite network of interest, but with randomized interactions across replicas. Such randomization renders certain excitatory networks fully tractable at the cost of neglecting activity correlations, but with explicit dependence on the finite size of the neural constituents. However, metastable dynamics typically unfold in networks with mixed inhibition and excitation. Here, we extend the RMF computational framework to point-process-based neural network models with exponential stochastic intensities, allowing for mixed excitation and inhibition. Within this setting, we show that metastable finite-size networks admit multistable RMF limits, which are fully characterized by stationary firing rates. Technically, these stationary rates are determined as the solutions of a set of delayed differential equations under certain regularity conditions that any physical solutions shall satisfy. We solve this original problem by combining the resolvent formalism and singular-perturbation theory. Importantly, we find that these rates specify probabilistic pseudo-equilibria which accurately capture the neural variability observed in the original finite-size network. We also discuss the emergence of metastability as a stochastic bifurcation, which can be interpreted as a static phase transition in the RMF limits. In turn, we expect to leverage the static picture of RMF limits to infer purely dynamical features of metastable finite-size networks, such as the transition rates between pseudo-equilibria. 
    more » « less