In this Review, we review some recent rigorous results on large N problems in quantum field theory, stochastic quantization, and singular stochastic partial differential equations (SPDEs) and their mean field limit problems. In particular, we discuss the O( N) linear sigma model on a two- and three-dimensional torus. The stochastic quantization procedure leads to a coupled system of N interacting Φ 4 equations. In d = 2, we show uniformity in N bounds for the dynamics and convergence to a mean-field singular SPDE. For large enough mass or small enough coupling, the invariant measures [i.e., the O( N) linear sigma model] converge to the massive Gaussian free field, the unique invariant measure of the mean-field dynamics, in a Wasserstein distance. We also obtain tightness for certain O( N) invariant observables as random fields in suitable Besov spaces as N → ∞, along with exact descriptions of the limiting correlations. In d = 3, the estimates become more involved since the equation is more singular. We discuss in this case how to prove convergence to the massive Gaussian free field. The proofs of these results build on the recent progress of singular SPDE theory and combine many new techniques, such as uniformity in N estimates and dynamical mean field theory. These are based on joint papers with Scott Smith, Rongchan Zhu, and Xiangchan Zhu.
more »
« less
On adversarial robustness and the use of Wasserstein ascent-descent dynamics to enforce it
Abstract We propose iterative algorithms to solve adversarial training problems in a variety of supervised learning settings of interest. Our algorithms, which can be interpreted as suitable ascent-descent dynamics in Wasserstein spaces, take the form of a system of interacting particles. These interacting particle dynamics are shown to converge toward appropriate mean-field limit equations in certain large number of particles regimes. In turn, we prove that, under certain regularity assumptions, these mean-field equations converge, in the large time limit, toward approximate Nash equilibria of the original adversarial learning problems. We present results for non-convex non-concave settings, as well as for non-convex concave ones. Numerical experiments illustrate our results.
more »
« less
- PAR ID:
- 10617705
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Information and Inference: A Journal of the IMA
- Volume:
- 13
- Issue:
- 3
- ISSN:
- 2049-8772
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)A broad class of unsupervised deep learning methods such as Generative Adversarial Networks (GANs) involve training of overparameterized models where the number of parameters of the model exceeds a certain threshold. Indeed, most successful GANs used in practice are trained using overparameterized generator and discriminator networks, both in terms of depth and width. A large body of work in supervised learning have shown the importance of model overparameterization in the convergence of the gradient descent (GD) to globally optimal solutions. In contrast, the unsupervised setting and GANs in particular involve non-convex concave mini-max optimization problems that are often trained using Gradient Descent/Ascent (GDA). The role and benefits of model overparameterization in the convergence of GDA to a global saddle point in non-convex concave problems is far less understood. In this work, we present a comprehensive analysis of the importance of model overparameterization in GANs both theoretically and empirically. We theoretically show that in an overparameterized GAN model with a 1-layer neural network generator and a linear discriminator, GDA converges to a global saddle point of the underlying non-convex concave min-max problem. To the best of our knowledge, this is the first result for global convergence of GDA in such settings. Our theory is based on a more general result that holds for a broader class of nonlinear generators and discriminators that obey certain assumptions (including deeper generators and random feature discriminators). Our theory utilizes and builds upon a novel connection with the convergence analysis of linear timevarying dynamical systems which may have broader implications for understanding the convergence behavior of GDA for non-convex concave problems involving overparameterized models. We also empirically study the role of model overparameterization in GANs using several large-scale experiments on CIFAR-10 and Celeb-A datasets. Our experiments show that overparameterization improves the quality of generated samples across various model architectures and datasets. Remarkably, we observe that overparameterization leads to faster and more stable convergence behavior of GDA across the board.more » « less
-
null; null; null; null (Ed.)Many imaging problems can be formulated as inverse problems expressed as finite-dimensional optimization problems. These optimization problems generally consist of minimizing the sum of a data fidelity and regularization terms. In Darbon (SIAM J. Imag. Sci. 8:2268–2293, 2015), Darbon and Meng, (On decomposition models in imaging sciences and multi-time Hamilton-Jacobi partial differential equations, arXiv preprint arXiv:1906.09502, 2019), connections between these optimization problems and (multi-time) Hamilton-Jacobi partial differential equations have been proposed under the convexity assumptions of both the data fidelity and regularization terms. In particular, under these convexity assumptions, some representation formulas for a minimizer can be obtained. From a Bayesian perspective, such a minimizer can be seen as a maximum a posteriori estimator. In this chapter, we consider a certain class of non-convex regularizations and show that similar representation formulas for the minimizer can also be obtained. This is achieved by leveraging min-plus algebra techniques that have been originally developed for solving certain Hamilton-Jacobi partial differential equations arising in optimal control. Note that connections between viscous Hamilton-Jacobi partial differential equations and Bayesian posterior mean estimators with Gaussian data fidelity terms and log-concave priors have been highlighted in Darbon and Langlois, (On Bayesian posterior mean estimators in imaging sciences and Hamilton-Jacobi partial differential equations, arXiv preprint arXiv:2003.05572, 2020). We also present similar results for certain Bayesian posterior mean estimators with Gaussian data fidelity and certain non-log-concave priors using an analogue of min-plus algebra techniques.more » « less
-
Federated learning is an important framework in modern machine learning that seeks to integrate the training of learning models from multiple users, each user having their own local data set, in a way that is sensitive to data privacy and to communication loss constraints. In clustered federated learning, one assumes an additional unknown group structure among users, and the goal is to train models that are useful for each group, rather than simply training a single global model for all users. In this paper, we propose a novel solution to the problem of clustered federated learning that is inspired by ideas in consensus-based optimization (CBO). Our new CBO-type method is based on a system of interacting particles that is oblivious to group memberships. Our model is motivated by rigorous mathematical reasoning, which includes a mean-field analysis describing the large number of particles limit of our particle system, as well as convergence guarantees for the simultaneous global optimization of general non-convex objective functions (corresponding to the loss functions of each cluster of users) in the mean-field regime. Experimental results demonstrate the efficacy of our FedCBO algorithm compared to other state-of-the-art methods and help validate our methodological and theoretical work.more » « less
-
This work develops analysis and algorithms for solving a class of bilevel optimization problems where the lower-level (LL) problems have linear constraints. Most of the existing approaches for constrained bilevel problems rely on value function-based approximate reformulations, which suffer from issues such as non-convex and non-differentiable constraints. In contrast, in this work, we develop an implicit gradient-based approach, which is easy to implement, and is suitable for machine learning applications. We first provide an in-depth understanding of the problem, by showing that the implicit objective for such problems is in general non-differentiable. However, if we add some small (linear) perturbation to the LL objective, the resulting implicit objective becomes differentiable almost surely. This key observation opens the door for developing (deterministic and stochastic) gradient-based algorithms similar to the state-of-the-art ones for unconstrained bi-level problems. We show that when the implicit function is assumed to be stronglyconvex, convex, and weakly-convex, the resulting algorithms converge with guaranteed rate. Finally, we experimentally corroborate the theoretical findings and evaluate the performance of the proposed framework on numerical and adversarial learning problems.more » « less
An official website of the United States government

