Motivated by both theory and practice, we study how random pruning of the weights affects a neural network's neural tangent kernel (NTK). In particular, this work establishes an equivalence of the NTKs between a fullyconnected neural network and its randomly pruned version. The equivalence is established under two cases. The first main result studies the infinitewidth asymptotic. It is shown that given a pruning probability, for fullyconnected neural networks with the weights randomly pruned at the initialization, as the width of each layer grows to infinity sequentially, the NTK of the pruned neural network converges to the limiting NTK of the original network with some extra scaling. If the network weights are rescaled appropriately after pruning, this extra scaling can be removed. The second main result considers the finitewidth case. It is shown that to ensure the NTK's closeness to the limit, the dependence of width on the sparsity parameter is asymptotically linear, as the NTK's gap to its limit goes down to zero. Moreover, if the pruning probability is set to zero (i.e., no pruning), the bound on the required width matches the bound for fullyconnected neural networks in previous works up to logarithmic factors. The proof of this result requires developing a novel analysis of a network structure which we called maskinduced pseudonetworks. Experiments are provided to evaluate our results.
more »
« less
Metastable spiking networks in the replicameanfield limit
Characterizing metastable neural dynamics in finitesize spiking networks remains a daunting challenge. We propose to address this challenge in the recently introduced replicameanfield (RMF) limit. In this limit, networks are made of infinitely many replicas of the finite network of interest, but with randomized interactions across replicas. Such randomization renders certain excitatory networks fully tractable at the cost of neglecting activity correlations, but with explicit dependence on the finite size of the neural constituents. However, metastable dynamics typically unfold in networks with mixed inhibition and excitation. Here, we extend the RMF computational framework to pointprocessbased neural network models with exponential stochastic intensities, allowing for mixed excitation and inhibition. Within this setting, we show that metastable finitesize networks admit multistable RMF limits, which are fully characterized by stationary firing rates. Technically, these stationary rates are determined as the solutions of a set of delayed differential equations under certain regularity conditions that any physical solutions shall satisfy. We solve this original problem by combining the resolvent formalism and singularperturbation theory. Importantly, we find that these rates specify probabilistic pseudoequilibria which accurately capture the neural variability observed in the original finitesize network. We also discuss the emergence of metastability as a stochastic bifurcation, which can be interpreted as a static phase transition in the RMF limits. In turn, we expect to leverage the static picture of RMF limits to infer purely dynamical features of metastable finitesize networks, such as the transition rates between pseudoequilibria.
more »
« less
 Award ID(s):
 2113213
 NSFPAR ID:
 10337566
 Editor(s):
 Beck, Jeff
 Date Published:
 Journal Name:
 PLOS Computational Biology
 Volume:
 18
 Issue:
 6
 ISSN:
 15537358
 Page Range / eLocation ID:
 e1010215
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We describe the convex semiinfinite dual of the twolayer vectoroutput ReLU neural network training problem. This semiinfinite dual admits a finite dimensional representation, but its support is over a convex set which is difficult to characterize. In particular, we demonstrate that the nonconvex neural network training problem is equivalent to a finitedimensional convex copositive program. Our work is the first to identify this strong connection between the global optima of neural networks and those of copositive programs. We thus demonstrate how neural networks implicitly attempt to solve copositive programs via seminonnegative matrix factorization, and draw key insights from this formulation. We describe the first algorithms for provably finding the global minimum of the vector output neural network training problem, which are polynomial in the number of samples for a fixed data rank, yet exponential in the dimension. However, in the case of convolutional architectures, the computational complexity is exponential in only the filter size and polynomial in all other parameters. We describe the circumstances in which we can find the global optimum of this neural network training problem exactly with softthresholded SVD, and provide a copositive relaxation which is guaranteed to be exact for certain classes of problems, and which corresponds with the solution of Stochastic Gradient Descent in practice.more » « less

We study a general setting of neutral evolution in which the population is of finite, constant size and can have spatial structure. Mutation leads to different genetic types (``traits"), which can be discrete or continuous. Under minimal assumptions, we show that the marginal trait distributions of the evolutionary process, which specify the probability that any given individual has a certain trait, all converge to the stationary distribution of the mutation process. In particular, the stationary frequencies of traits in the population are independent of its size, spatial structure, and evolutionary update rule, and these frequencies can be calculated by evaluating a simple stochastic process describing a population of size one (i.e. the mutation process itself). We conclude by analyzing mixing times, which characterize rates of convergence of the mutation process along the lineages, in terms of demographic variables of the evolutionary process.more » « less

null (Ed.)We study games with nonlinear best response functions played on a network consisting of disjoint communities. Prior works on network games have identified conditions to guarantee the uniqueness and stability of Nash equilibria in a network without any community structure. In this paper we are interested in accomplishing the same for networks with a community structure; it turns out that these conditions are much easier to verify with certain community structures. Specifically, we consider multipartite graphs and show that the uniqueness and stability of Nash equilibria are related to matrices which are potentially much lower in dimension, on the order of the number of communities, compared to samesize networks without a multipartite structure, in which case such matrices have a dimension the size of the network. We further introduce a new notion of degree centrality to measure the importance and influence of a community in such a network. We show that this notion enables us to find new conditions for uniqueness and stability of Nash equilibria.more » « less

This paper is about a class of stochastic reaction networks. Of interest are the dynamics of interconversion among a finite number of substances through reactions that consume some of the substances and produce others. The models we consider are continuoustime Markov jump processes, intended as idealizations of a broad class of biological networks. Reaction rates depend linearly on “enzymes,” which are among the substances produced, and a reaction can occur only in the presence of sufficient upstream material. We present rigorous results for this class of stochastic dynamical systems, the meanfield behaviors of which are described by ordinary differential equations (ODEs). Under the assumption of exponential network growth, we identify certain ODE solutions as being potentially traceable and give conditions on network trajectories which, when rescaled, can with high probability be approximated by these ODE solutions. This leads to a complete characterization of the ω limit sets of such network solutions (as points or random tori). Dimension reduction is noted depending on the number of enzymes. The second half of this paper is focused on depletion dynamics, i.e., dynamics subsequent to the “phase transition” that occurs when one of the substances becomes unavailable. The picture can be complex, for the depleted substance can be produced intermittently through other network reactions. Treating the model as a slow–fast system, we offer a meanfield description, a first step to understanding what we believe is one of the most natural bifurcations for reaction networks.more » « less